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RISC CPU PROCESSOR 


Integrated Device Technology, Inc. 





FEATURES: 


e Enhanced instruction set compatible version of the 
IDT79R2000 RISC CPU. 

e Full 32-bit Operation—Thirty-two 32-bit registers and all 
instructions and addresses are 32-bit. 

e Efficient Pipelining—The CPU's 5-stage pipeline design 
assists in obtaining an execution rate approaching one 
instruction per cycle. Pipeline stalls and exceptions are 
handled precisely and efficiently. 

¢ On-Chip Cache Control—The IDT79R3000 provides a high 
bandwidth memory interface that handles separate external 
Instruction and Data Caches ranging in size from 4 to 
256 Kbytes each. Both the caches are accessed during a 
single CPU cycle. All cache control is on-chip. 

¢ On-Chip Memory Management Unit—A fully-associative, 64 
entry Translation Lookaside Buffer (TLB) provides fast 
address translation for virtual-to-physical memory mapping of 
the 4 Gigabyte virtual address space. 

e Coprocessor Interface—The IDT79R3000 generates all 
addresses and handles memory interface control for up to 
three additional tightly coupled external processors. 

¢ Optimizing Compilers are available for C, Fortran, Pascal, 
COBOL, Ada, and PL/1. 

e UNIX™ System V.3 and BSD 4.3 operating systems 
supported. 

e High-speed CEMOS™ technology. 

¢ Instruction set compatible with the IDT79R2000 RISC CPU. 

e 16.7MHz, 20MHz, 25MHz and 33MHz clock rates yield up to 
28 MIPS sustained throughput. 

e Supports independent multiword block refill of both the 
instruction and data caches with variable block sizes. 
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e Supports concurrent refill and execution of instructions. 

e Partial word stores executed as read-modify-write operations. 

e 6 external interrupt inputs (up to 64 different sources), 2 
software interrupts, with single cycle latency to exception 
handler routine. 

e Flexible multiprocessing support on chip with no impact on 
uniprocessor designs. 

e Military product compliant to MIL-STD-883, Class B. 


DESCRIPTION: 


The IDT 79R3000 RISC Microprocessor consists of two tightly- 
coupled processors integrated on a single chip. The first processor 
is a full 32-bit CPU based on RISC (Reduced Instruction Set Com- 
puter) principles to achieve a new standard of microprocessor per- 
formance. The second processor is asystem control coprocessor, 
called CPO, containing a fully-associative 64 entry TLB (Transla- 
tion Lookaside Buffer), MMU (Memory Management Unit) and 
control registers, supporting a 4 Gigabyte virtual memory subsys- 
tem, and a Harvard Architecture Cache Controller achieving a 
bandwidth of over 260 Mbytes/second using industry standard 
static RAMs. 

This data sheet provides an overview of the features and archi- 
tecture of the 79R3000 CPU, Revision 2.0. A more detailed de- 
scription of the operation of the device is incorporated in the 
“R3000 Family Hardware User Manual”, and a more detailed archi- 
tectural overview is provided in the “mips RISC Architecture” book, 
both available from IDT. Documentation providing details of the 
software and development environments supporting — this 
processor are also available from IDT. 
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IDT79R3000 CPU Registers 

The IDT 79R3000 CPU provides 32 general purpose 32-bit reg- 
isters, a 32-bit Program Counter, and two 32-bit registers that hold 
the results of integer multiply and divide operations. Only two of the 
32 general registers have a special purpose: register rO is hard- 
wired to the value “0”, which is a useful constant, and register r31 is 
used as the link register in jump-and-link instructions (return ad- 
dress for subroutine calls). 

The CPU registers are shown in Figure 2. Note that there is no 
Program Status Word (PSW) register shown in this figure: the 
functions traditionally provided by a PSW register are instead 
provided inthe Status and Cause registers incorporated within the 
System Control Coprocessor (CPO). 
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Figure 2. IDT79R3000 CPU Registers 


Instruction Set Overview 

All IDT 79R3000 instructions are 32 bits long, and there are only 
three instruction formats. This approach simplifies instruction de- 
coding thus minimizing instruction execution time. The 79R3000 
processor initiates a new instruction on every run cycle, and is able 
to complete an instruction on almost every clock cycle. The only 
exceptions are the Load instructions and Branch instructions, 
which each have a single cycle of latency associated with their 
execution. Note, however, that in the majority of cases the compil- 
ers are able to fill these latency cycles with useful instructions 
which do not require the result of the previous instruction. This ef- 
fectively eliminates these latency effects. 

The actual instruction set of the CPU was determined after ex- 
tensive simulations to determine which instructions should be im- 
plemented in hardware, and which operations are best synthe- 
sized in software from other basic instructions. This methodology 
resulted inthe R3000 having the highest performance of any avail- 
able microprocessor. 


|-Type (Immediate) 
31 2625 2120 16 15 0 


| op | is | it | immediate 


J-Type (Jump) 
31 26 25 0 


R-Type (Register) . 
31 2625 2120 1615 1110 65 0 


| op | os | nt [id | re | funct_ 


Figure 3. IDT79R3000 Instruction Formats 
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The IDT79R3000 instruction set can be divided into the following 
groups: 

e Load/Store instructions move data between memory and 
general registers. They are all l-type instructions, since the only 
addressing mode supported is base register plus 16-bit, signed 
immediate offset. 

The Load instruction has a single cycle of latency, which means 
that the data being loaded is not available to the instruction 
immediately after the load instruction. The compiler will fill this 
delay slot with either an instruction which is not dependent on 
the loaded data, or with a NOP instruction. There is no latency 
associated with the store instruction. 

Loads and Stores can be performed on byte, half-word, word, or 
unaligned word data (32 bit data not aligned on a modulo-4 
address). The CPU cache is constructed as a write-through 
cache. 


e Computational instructions perform arithmetic, logical and 

shift operations on values in registers. They occur in both R-type 
(both operands and the result are registers) and |-type (one 
operand is a 16-bit immediate) formats. 
Note that computational instructions are three operand 
instructions; that is, the result of the operation can be stored into 
a different register than either of the two operands. This means 
that operands need not be overwritten by arithmetic operations. 
This results in a more efficient use of the large register set. 

e Jump and Branch instructions change the control flow of a 
program. Jumps are always to a paged absolute address 
formed by combining a 26-bit target with four bits of the Program 
counter (J-type format, for subroutine calls), or 32-bit register 
byte addresses (R-type, for returns and dispatches). Branches 
have 16-bit offsets relative to the program counter (I-type). 
Jump and Link instructions save a return address in Register 
31. The 79R3000 instruction set features a number of branch 
conditions. Included is the ability to compare a register to zero 
and branch, and also the ability to branch based on a 
comparison between two registers. Thus, net performance is 
increased since software does not have to perform arithmetic 
instructions prior to the branch to set up the branch conditions. 

e Coprocessor instructions perform operations in the 
coprocessors. Coprocessor Loads and Stores are I-type. 
Coprocessor computational instructions have coprocessor- 
dependent formats (see coprocessor manuals). 

e Coprocessor 0 instructions perform operations on the System 
Control! Coprocessor (CPO) registers to manipulate the memory 
management and exception handling facilities of the processor. 

e Special instructions perform a variety of tasks, including 
movement of data between special and general registers, 
system calls, and breakpoint. They are always R-type. 


Table 1 lists the instruction set of the IDT79R3000 processor. 
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Load Byte Multiply 
Load Byte Unsigned Multiply Unsigned 
Load Halfword Divide 
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‘ Move To HI 
Load Word Right Movie Front © 
Store Byte Move To LO 
Store Halfword 
Store Word Jump and Branch Instructions 
Store Word Left Jump 
Store Word Right Jump and Link 

Jump to Register 

Arithmetic Instructions Jump and Link Register 





(ALU Immediate) 


Add Immediate 

Add Immediate Unsigned 
Set on Less Than Immediate 
Set on Less Than Immediate 


Branch on Equal 

Branch on Not Equal 

Branch on Less than or Equal to Zero 
Branch on Greater Than Zero 
Branch on Less Than Zero 






























Unsigned BGEZ Branch on Greater than or 
AND Immediate Equal to Zero 
OR Immediate BLTZAL Branch on Less Than Zero and Link 





BGEZAL Branch on Greater than or Equal to 


Zero and Link 


Special Instructions 


System Call 
Break 


Exclusive OR Immediate 
Load Upper Immediate 
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(3-operand, register—-type) 
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js Unsigned Coprocessor Instructions 
Ss LWCz Load Word from Coprocessor 
ubtract 

Subtract Unsigned SWCz Store Word to Coprocessor 
MTCz Move To Coprocessor 

SetonLess Than MFCz Move From Coprocessor 

Set on Less Than Unsigned CTCz Move Control to Coprocessor 
CFCz Move Control From Coprocessor 
COPz Coprocessor Operation 

Exclusive OR BCzT Branch on Coprocessor z True 


BCzF Branch on Coprocessor z False 


NOR 







System Control Coprocessor 











Shift Instructions 









Shift Left Logical (CPO) Instructions 
Shift Right Logical Move To CPO 
Shift Right Arithmetic Move From CPO 


Shift Left Logical Variable 
Shift Right Logical Variable 
Shift Right Arithmetic Variable 


Read indexed TLB entry 
Write Indexed TLB entry 
Write Random TLB entry 
Probe TLB for matching entry 


Restore From Exception 


Table 1. IDT79R3000 Instruction Summary 
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IDT79R3000 System Control Coprocessor (CPO) 

The iIDT79R3000 can operate with up to four tightly-coupled cop- 
rocessors (designated CPO through CP3). The System Control 
Coprocessor (or CPO), is incorporated on the IDT79R3000 chip 


MILITARY AND COMMERCIAL TEMPERATURE RANGES 


and supports the virtual memory system and exception handling 
functions of the IDT79R3000. The virtual memory system is imple- 
mented using a Translation Lookaside Buffer and a group of pro- 
grammable registers as shown in Figure 4. 
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Figure 4. The System Coprocessor Registers 


System Control Coprocessor (CPO) Registers 

The CPO registers shown in Figure 4 are used to control the 
memory management and exception handling capabilities of the 
IDT79R3000. Table 2 provides a brief description of each register. 


REGISTER DESCRIPTION 


EntryHi_ = |High half of a TLB entry 

EntryLo Low half of a TLB entry 

Index Programmable pointer into TLB array 
Random |Pseudo-—random pointer into TLB array 


Status Mode, interrupt enables, and diagnostic status info 
Cause Indicates nature of last exception 

EPC Exception Program Counter 

Context Pointer into kernel’s virtual Page Table Entry array 
BadVA_ {Most recent bad virtual address 


PRid Processor revision identification (Read only) 





Table 2. System Control Coprocessor (CPO) Registers 
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Memory Management System 

The IDT79R3000 has an addressing range of 4 Gbytes. How- 
ever, since most IDT79R3000 systems implement a physical 
memory smaller than 4 Gbytes, the IDT79R3000 provides for the 
logical expansion of memory space by translating addresses 
composed in a large virtual address space into available physical 
memory address. The 4 GByte address space is divided into 
2 GBytes which can be accessed by both the users and the kernel, 
and 2 GBytes for the kernel only. 


The TLB (Transtation Lookaside Buffer) 

Virtual memory mapping is assisted by the Translation 
Lookaside Buffer (TLB). The on-chip TLB provides very fast virtual 
memory access and is well-matched to the requirements of multi- 
tasking operating systems. The fully-associative TLB contains 
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64 entries, each of which maps a 4-Kbyte page, with controls for 
read/write access, cacheability, and process identification. The 
TLB allows each user to access upto 2 Gbytes of virtual address 
space. 

Figure 5 illustrates the format of each TLB entry. The Translation 
operation involves matching the current Process ID (PID) and up- 
per 20 bits of the address against PID and VPN (Virtual Page Num- 
ber) fields in the TLB. When both match (or the TLB entry is 
Global), the VPN is replaced with the PFN (Physical Frame Num- 
ber) to form the physical address. 

TLB misses are handled in software, with the entry to be re- 
placed determined by a simple RANDOM function. The routine to 
process a TLB miss in the UNIX environment requires only 10-12 
cycles, which compares favorably with many CPUs which perform 
the operation in hardware. 


TLB ENTRY FORMAT 
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ENTRYLO 


VPN — Virtual Page number 
TLBPID — Process ID 
PFN — Physical frame number 


N — Non-cacheable flag 

D — Dirty flag (Write protect) 
V — Valid entry flag 

G — Global flag (ignore PID ) 


O — Reserved 


Figure 5. TLB Entry Format 


IDT79R3000 Operating Modes 

The IDT79R3000 has two operating modes: Usermode and Ker- 
nelmode. The IDT79R3000 normally operates in the User mode 
until an exception is detected forcing it into the Kernel mode. It 
remains in the Kernel mode until a Restore From Exception (RFE) 


instruction is executed. The manner in which memory addresses 
are translated or mapped depends on the operating mode of 
the IDT79R3000. Figure 6 shows the MMU translation performed 
for each of the operating modes. 
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MMU ADDRESS TRANSLATION 
VIRTUAL —> PHYSICAL 
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Figure 6. IDT79R3000 Virtual Address Mapping 


User Mode—in this mode, a single, uniform virtual address 
space (kuseg) of 2 Gbyte is available. Each virtual address is ex- 
tended with a 6-bit process identifier field to form unique virtual ad- 
dresses. Allreferences to this segment are mapped through the 
TLB. Use of the cache for up to 64 processes is determined by bit 
settings for each page within the TLB entries. 

Kernel Mode—four separate segments are defined in this 
mode: : 


¢ kuseg—when in the kernel mode, references to this segment 
are treated just like user mode references, thus streamlining 
kernel access to user data. 


e ksegO—references to this 512 Mbyte segment use cache 
memory but are not mapped through the TLB. Instead, they 
always map to the first 0.5 GBytes of physical address space. 


e kseg!—teferences to this 512 Mbyte segment are not mapped 
through the TLB and do not use the cache. Instead, they are 
hard-mapped into the same 0.5 GByte segment of physical 
address space as kseg0. 

e kseg2—teferences to this 1 Gbyte segment are always mapped 
through the TLB and use of the cache is determined by bit 
settings within the TLB entries. 


IDT79R3000 Pipeline Architecture 
The execution of a single IDT79R3000 instruction consists of five 
primary steps: 
1)IF |—Fetch the instruction (I-Cache). 
2) RD —Read any required operands from CPU registers 
while decoding the instruction. 
3) ALU — Perform the required operation on instruction 
operands. 
4) MEM— Access memory (D-Cache). 
5) WB —Write back results to register file. 
Each of these steps requires approximately one CPU cycle as 
shown in Figure 7 (parts of some operations overlap into another 
cycle while other operations require only 1/2 cycle). 


Instruction Execution 





WB 


IF RD 
| tcache | pF | op __|D-cACHE| we | 





ey | 


one cycle 


Figure 7. 1DT79R3000 Instruction Pipeline 
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The IDT79R3000 uses a 5-stage pipeline to achieve an in- 
struction execution rate approaching one instruction per CPU cy- 
cle. Thus, execution of five instructions at a time are overlapped as 
shown in Figure 8. 


IDT79R3000 Instruction Pipeline 
(5-deep) . 


BURRS 
[IF | RD | ALU | MEM| WB 












Instruction " 
Flow [if | RD [ALU [MEM] WE 
Current 
CPU 
Cycle 


Figure 8. IDT79R3000 Execution Sequence 


This pipeline operates efficiently because different CPU re- 
sources (address and data bus accesses, ALU operations, regis- 
ter accesses, and so on) are utilized on a non-interfering basis. 


Memory System Hierarchy 

The high performance capabilities of the IDT79R3000 processor 
demand system configurations incorporating techniques fre- 
quently employed in large, mainframe computers but seldom en- 
countered in systems based on more traditional microprocessors. 

Aprimary goal of systems employing RISC techniques is to mini- 
mize the average number of cycles each instruction requires for 
execution. In order to achieve this goal, RISC processors incorpo- 
rate a number of RISC techniques including a compact and uni- 
form instruction set, a deep instruction pipeline (as described 
above), and utilization of optimizing compilers. Many of the advan- 
tages obtained from these techniques can, however, be ne- 
gated by an inefficient memory system. 

Figure 9 illustrates memory in a simple microprocessor system. 
In this system, the CPU outputs addresses to memory and reads 
instructions and data from memory or writes data to memory. The 
address space is completely undifferentiated: instructions, data, 
and I/O devices are all treated the same. In such a system, a pri- 
mary limiting performance factor is memory bandwidth. 
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Figure 9. A Simple Microprocessor Memory System 


Figure 10 illustrates a memory system that supports the signifi- 
cantly greater memory bandwidth required to take full advan- 
tage of the IDT79R3000's performance capabilities. The key fea- 
tures of this system are: 
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e External Cache Memory—Local, high-speed memory (called 


cache memory) is used to hold instructions and data that is 
repetitively accessed by the CPU (for example, within a 
program loop) and thus reduces the number of references that 
must be made to the slower-speed main memory. Some 
microprocessors provide a limited amount of cache memory on 
the CPU chip itself. The external caches supported by the 
IDT79R3000 can be much larger; while a small cache can 
improve performance of some _ programs, _ significant 
improvements for a wide range of programs require large 
caches. 

Separate Caches for data and instructions—Even with 
high-speed caches, memory speed can still be a limiting factor 
because of the fast cycle time of a high-performance 
microprocessor. The IDT79R3000 supports separate caches 
for instructions and data and alternates accesses of the two 
caches during each CPU cycle. Thus, the processor can obtain 
data and instructions at the cycle rate of the CPU using caches 
constructed with commercially available IDT static RAM 
devices. 


In order to maximize bandwidth in the cache while minimizing 
the requirement for SRAM access speed, the R3000 divides a 
single-processor clock cycle into two phases. During one 
phase, the address for the datacache access is presented while 
data previously addressed in the instruction cache is read; 
during the next phase, the data operation is completed while the 
instruction cache is being addressed. Thus, both caches are 
read in a single processor cycle using only one set of address 
and data pins. 

Write Buffer—lIn order to ensure data consistency, all data that 
is written to the data cache must also be written out to main 
memory. The cache write model used by the IDT79R3000 is 
that of a write-through cache; that is, all data written by the CPU 
is immediately written into the main memory. To relieve the CPU 
of this responsibility (and the inherent performance burden) the 
IDT79R3000 supports an interface to a write buffer. The 
IDT79R3020 Write Buffer captures data (and associated 
addresses) output by the CPU and ensures that the data is 
passed on to main memory. 
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Figure 10. An IDT79R3000 System with a 
High-Performance Memory System 
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IDT79R3000 Processor Subsystem Interfaces 


Figure 11 illustrates the three subsystem interfaces provided by 
the IDT79R3000 processor: 


e Cache control interface (on-chip) for separate data and 
instruction caches permits implementation of off-chip caches 
using standard IDT SRAM devices. The 79R3000 directly 
controls the cache memory with a minimum of external 
components. Both the instruction and data cache can vary from 
0 to 256K Bytes (64 K entries). The 79R3000 also includes the 
TAG control logic which determines whether or not the entry 
read from the cache is the desired data. 


The 79R3000 cache controller implements a direct mapped 
cache for high net performance (bandwidth). It has the ability to 
refill multiple words when a cache miss occurs, thus reducing 
the effective miss rate to less than 2% for large caches. When a 
cache miss occurs, the 79R3000 can support refilling the cache 
in 1, 4, 8, 16, or 32 word blocks to minimize the effective penalty 
of having to access main memory. The 79R3000 also 
incorporates the ability to perform instruction streaming; while 
the cache is refilling, the processor can resume execution once 
the missed word is obtained from main memory. In this way, the 
processor can continue to execute concurrently with the cache 
block refill. 


¢ Memory controller interface for system (main) memory. This 
interface also includes the logic and signals to allow operation 
with a write buffer to further improve memory bandwidth. In 
addition tothe standard full word access, the memory controller 
supports the ability to write bytes and half-words by using partial 
word operations. The memory controller also supports the 
ability to retry memory accesses if, for example, the data 
returned from memory is invalid and a bus error needs to be 
signalled. 


e Coprocessor Interface—The IDT79R3000 features a tightly 
coupled co-processor interface in which all co-processors 
maintain synchronization with the main processor; reside on the 
same data bus as the main processor; and participate in bus 
transactions in an identical manner to the main processor. The 
IDT79R3000 generates all required cache and memory control 
signals, including cache and memory addresses for attached 
coprocessors., As a result, only the data bus and a few control 
signals need to be connected to a coprocessor. 


The interface supports three types of coprocessor instructions: 
loads/stores, coprocessor operations, and processor- 
coprocessor transfers. Note that coprocessor loads and stores 
occur directly between the coprocessor and memory, without 
requiring the data to go through the CPU. 


Synchronization between the CPU and external coprocessors 
is achieved using a Phased-Lock Loop interface to the 
coprocessor. The coprocessor physical interface also includes 
coprocessor condition signals (CpCond(n)), which are used in 
coprocessor branch instructions, and acoprocessor busy signal 
(CpBusy) which is used to stall the CPU if the coprocessor 
needs to hold off subsequent operations. 

Finally, a precise exception interface is defined between the 
CPU and coprocessors using the external interrupt inputs of the 
CPU. This allows a coprocessor exception, even if it was the 
result of a multi-cycle operation, to be traced to the precise 
coprocessor operation which caused it. This is an important 
feature for languages which can define specific error handlers 
for each task. 

The interface supports up to four separate coprocessors. 
Coprocessor 0 is defined to be the system control coprocessor, 
and resides on the same chip as the CPU unit. Coprocessor 1 is 
the Floating Point Accelerator, IDT 79R3010. Coprocessors 2 
and 3 are available to support an interface to application specific 
functions. 
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MULTIPROCESSING SUPPORT 


The IDT79R3000 supports multiprocessing applications in a 
simple but effective way. Multiprocessing applications require 
cache coherency across the multiple processors. The 
IDT79R3000 offers two signals to support cache coherency: the 
first, MPStall, stalls the processor within two cycles of being re- 
ceived and keeps it from accessing the cache. This allows an ex- 
ternal agent to snoop into the processor data cache. The second 
signal, MPInvalidate, causes the processor to write data on the 
data cache bus which indicates the externally addressed cache 
entry is invalid. Thus, a subsequent access to that location would 
result in acache miss, and the data would be obtained from main 
memory. 

The two MP signals would be generated by a external logic which 
utilizes a secondary cache to perform bus snooping functions. The 
79R3000 does not impose an architecture for this secondary 
cache, but rather is flexible enough to support a variety of applica- 
tion specific architectures and still maintain cache coherency. Fur- 
ther, there is no impact on designs which do not require this fea- 
ture. 


ADVANCED FEATURES 


The IDT79R3000 offers a number of additional features such as 
the ability to swap the instruction and data caches, facilitating diag- 
nostics and cache flushing. Another feature isolates the caches, 
which forces cache hits to occur regardless of the contents of the 
tag fields. 

Further features of the IDT79R3000 are configured during the 
last four cycles prior to the negation of the RESET input. These 
functions include the ability to select cache sizes and cache refill 
block sizes; the ability to utilize the multiprocessor interface; 
whether or not instruction streaming is enabled; whether byte or- 
dering follows “Big-Endian” or “Little-Endian” protocols, etc. Table 
3 shows the configuration options selected at Reset. These are fur- 
ther discussed in the “Hardware User’s Manual”. 


BACKWARD COMPATIBILITY WITH 79R2000 


The IDT79R3000 can be used in sockets designed for the 
79R2000A. The pin-out of the 79R3000 has been selected to en- 
sure this compatibility, with new functions mapped onto previously 
unused pins. The instruction set is compatible with that of the 
79R2000 at the binary level. As a result, code written for the older 
processor can be executed. New features, such as block refill, in- 
struction streaming, etc. can be selectively disabled. 

In most 79R2000A applications, the 79R3000 can be placed in 
the socket with no modification to initialization settings. The initiali- 
zation of the 79R3000 includes whether or not the device should 
operate as a 79R2000A. Systems using 79R2000A would nor- 
mally have this input configured so that the device would default to 
this mode. Further application assistance on this topic is available 
from IDT. 


A SPECIAL NOTE ON PACKAGING 


Both the flat pack and the PGA packages for the 79R3000 incor- 
porate separate power and ground planes to eliminate noise asso- 
ciated with high frequency operation. This, coupled with the nu- 
merous power and ground pins provided on the device, helps to 
ensure very reliable operation. 
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INPUT W CYCLE X CYCLE Y CYCLE Z CYCLE 







































DBIkSize0 DBIkSize1 Extend Cache BigEndian 
IBikSizeO 1BikSizet Reserved(") TriState 
Reserved(') IStream Reserved(') NoCache 
Reserved(!) StorePartial MultiProcessor BusDriveOn 
PhaseDelayOn'?) PhaseDelayOn'2) PhaseDelayOn'2) PhaseDelayOn() 
R3000 Model?) R3000 Model?) R3000 Mode?) R3000 Model?) 


NOTES: 
1. Reserved entries must be driven high. 
2. These values must be driven stable throughout the entire RESET period. 


Table 3: IDT79R3000 Mode Selectable Features 


AdrLo Bus 





Tag Adrlo Data 
TagV DataP 
TagP 















Trans— 
parent 
Latch 












ICIk DClk 


















Data Tag |Adr 
[15:2] IDT79R3000 Processor 
with System Control 
Instruction Coprocessor 
Cache 


rl 
mi 


DRd 






Clk2xSys 























Clk2xSmp 
XEn Clk2xRd — wens 
SysOut Clk2xPhi 
AccTy[2:0] Reset 
MemRd TpSyne Coprocessors 
Mens, ati Fin 
RdBusy Exe 
WrBusy CpBusy 


CpCondio] — EpCond{3:1] 
BusError intr(5:0) b 


Figure 11. IDT79R3000 Subsystem Interfaces Example; 64 KB Caches 
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PIN CONFIGURATION 
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Data21 Co 87[ AdrLo2 
Data22 C4 r} AdrLo3 
Data24 CJ [J Adrlo4 
Data25 C4 i) AdrLo5 
Data26 C4 [—] AdrLo6 
Data31 C4 J AdrLo7 
DataP3 CO [J] AdrLo8 
Data27 CJ LJ AdrLo9 
Data28 Co [—) AdrLoi0 
XEn C4 [—] AdrLoi1 
Data29 Co ——) Adrloi2 
Data30 CJ J AdrLo13 
eis a = —] chy 
xPhi 3} VCC15 
GND7 Co 3 VCC16 
GND6 C4 [—} VCC17 
Clk2xSmp Co -—_] GNDi6 
vcec7 Co -—) GND17 
voce ( “1 VCC18 
GND5 Co I) vCCc19 
GND4 C [—] GNDi8 
GND3 C4 —] VCC20 
vcocs Co r—] VCC21 
voc4 [ [—}] VCC22 
voc3 Co LJ AdrLoi5 
GND2 C4 [3 CpCondo 
GNDi Co }—J CpCond1 
Clk2xSys_ Co J Resvd1 (1) 
_IRd1 Co [I GND19 
DRdi Co -—_] GND20 
IWrt Co fr) AdrLo16(2) 
vec? = fer 
J! Intd 
vcc1 Co CI int 
L__ Td 
SysOut Co _ {ate 
DClk Co J) Int4 
(Clk CO — IntS 
iRd2 (3 -—] CpoBusy 
“iva SS fae 
DWre Co — BusError 
MemWr (44 129 [J Reset 
7 
J UUUUULUU UU UU UU UU OU UU UU OU UU UU UU 
Ege SESS CRS SEINE S85 $5 S555 BS SANA SS SSG Seabees ec 
ga B88 FEFFEFEGEFSE agar SrsgGgGsSSrrrrrrreeerer 
o 
172-PIN CERAMIC FLATPACK 
(Cavity Side View) 
NOTES: 


1. Reserved pins must not be connected. 


2. AdrLo 16 & 17 are multi-function pins which are controlled by mode select programming on interrupt pins at reset time. 


AdrLo 16: MP Invalidate, CpCond (2). 
AdrLo 17: MP Stall, CpCond (3). 
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PIN CONFIGURATION 


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
AdrLo | AdrLo | AdrLo AdrLo | AdrLo |CpCondjAdrLoJAdrLo"| Thtrd | IntrS | _Wr_ | Reset |vcc10 






















A 
B AdrLo | DRd2 | AdrLo | AdrLo | AdrLo AdrLo |CpCond_Intr1 Cp Bus Tag12 | Tag15 
3 7 9 12 13 1 Busy | Error 
c | AdrLo | AdrLo | VCC13] AdrLo | AdrLo | GND13] GND12} VCC11] Intro | Intr4 Rd |GND11] Tag13 | TagPo | Tagi8 
0 4 5 8 Busy 
D AdrLo | GNDO Tag14 { Tag17 | Tag19 
2 
fe | DataP | Data | AdrLo Tag16 | Tag20 | VCC9 
0 0 1 
F | VCCO a2 | oe | GND10] Tag21 | Tag23 
7 
E oe | oe GND! enpe | Tag22 | TagP1 
4 
H | Data | Data | Data VCC8 | Tag25 | Tag24 
6 5 8 
J | Data | DataP | Data Tag28 | Tag29 | Tag26 
10 1 9 
K | Data | Data | GND2 GND8 | Tag | Tag27 
15 11 P2 
L | VCC1 Data Acc | Tag31 } Tag30 
17 Typ2 
M | Data | Data | DataP GND7 |} Acc | VCC7 
13 16 2 Typ1 
N | Data | Data | Data |GND3| Data | Data | VCC3| vCC4|GND5 | GND6 | DRdt | Mem TagV 
14 | 18 | 19 24 | P3 Wr 
p | Data | Data | lWr2 | Data | Data | Data | XEn | Data | Clk2x | Clk2x | DCik | IRdt | TWrt | Cp | Acc 
23 20 22 26 27 30 Sys Rd Syne | TypO 
Q | VCC2]| Data | Data | Data | Data | GND4] Data | Excep | Clk2x | Clk2x |SysOut] VCCS5 | IClk | DWr1 | VCC6 
21 25 31 28 29 tion Phi Smp 
144-Pin PGA (Top View) 
NOTE: 


1. AdrLo 16 & 17 are multi-function pins which are controlled by mode select programming on interrupt pins at reset time. 
AdrLo16: MP Invalidate, CpCond (2). 
AdrlLo17: MP Stall, CpCond (3). 
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PIN DESCRIPTIONS 


DESCRIPTION 


A 32-bit bus used for all instruction and data transmission among the processor, caches, memory interface, and 
coprocessors. 


Data (0-31) ne) 






DataP (0-3) Vo 


A 4-bit bus containing even parity over the data bus. 


1°) 


A 20-bit bus used for transferring cache tags and high addresses between the processor, caches, and memory interface. 
The tag validity indicator. 





VO | A3-bit bus containing even parity over the concatenation of TagV and Tag. 


An 18-bit bus containing byte addresses used for transferring low addresses from the processor to the caches and memory 





interface. (AdrLo 16: CpCond (2), AdrLo 17: CpCond (8) set by reset initialization). 
Read enable for the instruction cache. 
Write enable for the instruction cache. 
An identical copy of IRdT used to split the load. 
An identical copy of TWri used to split the load. 
e instruction cache address latch clock. This clock runs continuously. 
e read enable for the data cache. 


3 


e write enable for the data cache. 

An identical copy of DRT used to split the load. 
An identical copy of DWrT used to split the load. 
DClk The data cache address latch clock. This clock runs continuously. 
The read enable for the Read Buffer. 


A 3-bit bus used to indicate the size of data being transferred on the data bus, whether or not a data transfer is 
occurring, and the purpose of the transfer. 


Signals the occurrence of a main memory write 





Signals the occurrence of a main memory read. 
usError Signals the occurrence of a bus error during a main memory read or write. 


Indicates whether the processor is in the run or stall state. 


> Salas >joaiaina 

Q 2) = o|m | |p 
ada ey) q ae? |S 42 6/2 /<|5 
s Be) =-I1S i) 

3 EIS] |e 

nm a ra 


ception ndicates that the instruction about to commit state should be aborted and other exception related information. 


A reflection of the internal processor clock used to generate the system clock. 






A clock which is identical to SysOut and used by coprocessors for timing synchronization with the CPU. 


The main memory read stall termination signal. In most system designs RdBusy is normally asserted andis deasserted only to 
indicate the successful completion of a memory read. RdBusy is sampled by the processor only during memory read stalls. 





The main memory write stall initiation/termination signal. 


The coprocessor busy stall initiation/termination signal. 


(?) z fom Le) Ps 
gy} 3l 5 5 alos 
Oe er o QO jo ies) 
m1 ¥ ay SIS |& al asilé€ 
=| 2 = a <I} “<|]oll|* 
Ss 
I 


p 1 
CpCond (2-3) 


A 2-bit bus used to transfer conditional branch status from the coprocessors to the main processor. 


Conditional branch status from coprocessors to the processor. Function is provided on AdrLo 16/17 pins and is selected at 
reset time. 


Multiprocessing Stall. Signals to the processor that it should stall accesses to the caches in a multiprocessing environment. 
This is physically the same pin as CpCond3; its use is determined at RESET initialization. 


MPlnvalidate Multiprocessing Invalidate. Signals to the processor that it should issue invalidate data on the cache data bus. The address to 


be invalidated is externally provided. This is the same pin as CpCond2; its use is determined at RESET initialization. 


nt A 6-bit bus used by the memory interface and coprocessors to signal maskable interrupts to the processor. At reset time, 


mode select values are read in. 

The master double frequency input clock used for generating SysOut. 

| cikaxSmp | t_ clock input used to determine the sample point for data coming into the processor and coprocessors. 
ee ee A double frequency clock input used to determine the enable time of the cache RAMs. 


Clk2xPhi So A double frequency clock input used to determine the position of the internal phaset and 


cae ED Synchronous initialization input used to force execution starting from the reset memory address. Reset must be deasserted 









synchronously but asserted asynchronously. The deassertion of reset must be synchronized by the leading edge of SysOut. 
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ABSOLUTE MAXIMUM RATINGS" ?) RECOMMENDED OPERATING 


TEMPERATURE AND SUPPLY VOLTAGE 


Taminal valiage GRADE | reMPERATURE 
Vrenm {With Respectto | -05to+7.0 | -0.5to+7.0] V 
GND - 
: ; 


-55°C to +125°C 5.0+ 10% 
Operating oon Is Commercial] 0°C to +70°C 5.045% 
Temperature : 

Under Bias —-55 to +125 —65 to +135 

Cc 


OUTPUT LOADING FOR AC TESTING 


| Vin [Input Voltage | -0.5to+7.0 | -0.5 to +7.0 


NOTES: 

1. Stresses greater than those listed under ABSOLUTE MAXIMUM RAT- 
INGS may cause permanent damage to the device. This is a stress rating 
only and functional operation of the device at these or any other conditions 
above those indicated in the operational sections of this specification is not 
implied. Exposure to absolute maximum rating conditions for extended pe- 
riods may affect reliability. 


2. Vin minimum = -3.0V for pulse width less than 15ns. 
Vin should not exceed Vcc +0.5 Volts. 


3. Not more than one output should be shorted ata time. Duration of the short 
should not exceed 30 seconds. 




























To Device 
Under Test 





DC ELECTRICAL CHARACTERISTICS— 
COMMERCIAL TEMPERATURE RANGE Ta = 0°C to +70°C, Vcc = +5.0V + 5% 


16.67MHz 
PARAMETER TEST CONDITIONS 


Output HIGH Voltage Voc =Min., low=—4mA | 3.5 — | 
Output LOW Voltage Vcc = Min., lo. = 4mA he Oa 
Voc | Output HIGH Voltage(”) Vcc = Min., lon = —4mA 


HT Output HIGH Voltage (46) =| Vcc = Min., lon = -8mA 










Vo 
Vo 
1 
1 
1 


7) 
< 
= 
ies) 
oO 
re 


Ae 
ie) 


H 

L 
Vout Voc = Min., lou = 8mA 
Vin 
Vit 
VIHS 
N 
UT Output Capacitance (6) 
: Voo = Max. 
Vin = Voc 

Vu. = GND 

Vou = 2.4, Ve = OV 
ES: 


. Vit Min. = -3.0V for pulse width less than 15ns. Vit should not fall below —0.5 Volts for larger periods. 
. VHS and Vics apply to Clk2xSys, Clk2xSmp, Cik2xRd, Clk2xPhi, CpBusy, and Reset. 
. These parameters do not apply to the clock inputs. 


. VouTand VOLT apply to the bidirectional data and tag busses only. Note that ViH and ViL also apply to these signals. VoHT and VOLT are provided to give 
the designer further information about these specific signals. 


. Vik should not be held above Vcc + 0.5 volts. 
. Guaranteed by design. 


7. Vouc applies to RUN and Exception. 


Vits 
Cc 


je) 


Ic 
tit 
lo 





hon zg 
iF 
= 


aon 
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DC ELECTRICAL CHARACTERISTICS— 
MILITARY TEMPERATURE RANGE (Ta =-55°C to +125°C, Voc = +5.0V + 10%) 


16.67MHz 
PARAMETER TEST CONDITIONS sag site UNIT 


= 
i 
: 
1 [inputvich vonage? |S 
IN 


















n 
< 
= 
wo 
Oo 
re 


i 


{eo} 


V 

Vit Input LOW Voltage (") 

VIS Input HIGH Voltage (2: 5) 
Input LOW Voltage (1.2) 


Ol 
O 
Vics 
C; input Capacitance (6) 

Output Capacitance (6) 


Icc Operating Current 
I 


3 
> 


oO 
fe) 


NOTES: 


1. Vit Min. = -3.0V for pulse width less than 15ns. ViL should not fall below —0.5 Volts for larger periods. 
2. Vins and Vis apply to Clk2xSys, Clk2xSmp, Clk2xRd, Clk2xPhi, CpBusy, and Reset. 

3. These parameters do not apply to the clock inputs. 
4 


. VouTand VoLT apply to the bidirectional data and tag busses only. Note that ViH and VIL also apply to these signals. VOHT and VOLT are provided to give 
the designer further information about these specific signals. 


. ViH should not be held above Vcc + 0.5 volts. 
6. Guaranteed by design. 
7. Vouc applies to Run and Exception. 


a=! 
> 








= 
> 





PS 
> 





o 
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AC ELECTRICAL CHARACTERISTICS" 2 9) — 
COMMERCIAL TEMPERATURE RANGE (Ta = 0°C to +70°C, Vcc = +5.0V + 5%) 


16.67MHz | 20.0MHz 
SYMBOL PARAMETER TEST CONDITIONS| ,16.67MH2 | 20.0MH2 ” Pee a ecard UNIT 


Clock 


Input Clock High® | Transition <sns_[ 125 — | 10 — | 8 — [268 — | 
Input Clock Low”) | Transition<sns_| 125 — | 10 — | 8 — | | ns | 
30 


TcKP Input Clock Period(?) 500 | 25 500 




















— 













20 500 





























Clk2xSys to Clk2xSmp(®) 0 tcyc/4} 0 teyc/4]} 0 tcyc/4 
Clk2xSmp to Clk2xRd(®) QO tcyc/4| 0 tcyc/4} 0 tcyc/4 
Clk2xSmp to Clk2xPhi(®) 9 tcyc/4} 7 tcyc/4 5 tcyc/4 






Run Operation 


Data Enabie® eo? eel P| 
Data Disable i a 
Data Vali load=25eF | — 3 |— 3 | 






Write Delay Load = 25pF ae 
[Tos [| Datase-up | pe ee 
Data Hold Pi es = fees — | es — I. 





feebe ee eee 

[Taery | Access Type (20) | Lead=25pF | — 7 |— 6 _| 

—- 4 |—- 2 

| 

Stall Operation 

[Towa | AdsrossVaid | Lead=2epF [| — 90 |] 
: 

i 

- 


















Exception Valid 
Reset Initialization 
[Test | ResetPusewidth | 
Reset timing, Phase—lock on(4: 5) Pe ll 





NOTES: 


1. All timings are referenced to 1.5V. 

2. The clock parameters apply to all four 2xClocks: Clk2xSys, Clk2xSmp, Clk2xRd, and Clk2xPhi. 

3. This parameter is guaranteed by design. 

4. These parameters apply when the 79R3010 Floating Point Coprocessor is connected to the CPU. With phase lock on, Reset must be asserted for the 
longer of 3000 clock cycles or 200 microseconds. - 

. Teyc is one CPU clock cycle (two cycles of a 2x clock). 

6. With the exception of the Run signal, no two signals on a given device will derate for a given load by a difference greater than 15%. 


o 
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AC ELECTRICAL CHARACTERISTICS": 2 3) — 
MILITARY TEMPERATURE RANGE (1a =-55°C to +125°C, Vcc = +5.0V + 10%) 


SYMBOL PARAMETER TEST CONDITIONS 16.67MHz UNIT 


Input Clock High’) 
Input Clock Low’ 


Tckp Input Clock Period!) 
Clk2xSys to Clk2xSmp\®) 
Clk2xSmp to Clk2xRd(6) 
Clk2xSmp to Clk2xPhi(®) 


Run Operation 


[Torn | Datagnabio S| 

[Tos | DataDisablo S| 

[Tos | DataSe-up | 

[Tou | DataHold 
Stall Operation 


Reset Initlalization 


Reset Pulse Width 


Reset timing, Phase-lock on(*: 5) 
Reset timing, Phase-lock off(4. 5) 


Capacitive Load Deration 


|CLD__| Load Derate( CB 


NOTES: 

1. All timings are referenced to 1.5V. 

2. The clock parameters apply to all four 2xClocks: Clk2xSys, Clk2xSmp, Clk2xRd, and Clk2xPhi. 
3. This parameter is guaranteed by design. 
4 


. These parameters apply when the 79R3010 Floating Point Coprocessor is connected to the CPU. With phase lock on, Reset must be asserted for the 
longer of 3000 clock cycles or 200 microseconds. 


. Teye is one CPU clock cycle (two cycles of a 2x clock). 
6. With the exception of the Run signal, no two signals on a given device will derate for a given load by a difference greater than 15%. 





















































Load = 25pF 
Load = 25pF 
Load = 25pF 
Load = 25pF 







Load = 250 








oi 
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Tcklow P Tckp 


Clk2xSys ae 
Clk2xSmp = 
y Q Trd 
Clk2xRd 1 | 


Clk2xPhi 


Figure 12. Input Clock Timing 


SysOut 


SmpOut* 








* These signals are not actually output from the processor. They are drawn to provide 
a reference for other timing diagrams. 


Figure 13. Processor Reference Clock Timing 
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2 
Phase 


Tsys Tsys 
Tsys + | ;—— | 
~ areal A! joerc 
AddrLo DAddr KX lAddr = KD Addr 


AccTyp 0:1 


AccTyp 2 


Data and 
Tag Busses 


IClk 


BE 





Figure 14. Synchronous Memory (Cache) Timing 
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(ép) 
3S 
| 8 


AddrLo D Addr 


Tag 
(Address 
High) 


Tden 
Tacty 
Da 


y 


AccTyp 0:1 Reserved 


AccTyp 2 Reserved 


Data 
(Output) 





Figure 15. Memory Write Timing 
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Phase 





SysOut 


PhiOut 


a 
os ee oe (ie 

AddrLo KD Addr XK TAddr_ or Read Address KDAddr KK TAddr XD Addr > 
a a cs ee 








Fw 





Ta 
(dress << Read Address n> SiS 
- Tacty 
AccTypo:1 DataSize | > 
Tsacty 
ogi es 
AccTyp 2 Tsacty ee Gackad 





— 
ie | 


Tstl 


Data 
(Input) 


Fria 
ves 
Le 
1, 


MemkRd 


RdBusy 


Xen 


1 


Ss 
= 
oa 
< 
oa 


we 


+ 
3 
a 
W% 


CpCondo 


S| 
i 
=] 





;| 
Cc 
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Figure 16. Memory Read Timing 
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Data Bus 


Run 


CpBusy 


Exception 


CpCond(n) 





MILITARY AND COMMERCIAL TEMPERATURE RANGES 


Co—Processor Store Co-—Processor Load 


Condition 
Valid 


Figure 17. Co—Processor Load/Store Timing 
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1 2 
Phase 
SysOut 


PhiOut 








Figure 18. Interrupt Timing 
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NOTES: 

1. Reset must be negated synchronously; however, it can be asserted asynchronously. Designs should not rely on the proper functioning of SysOut prior 
to the assertion of Reset. 

2. If Phase-Lock On or R3000 Mode are asserted as mode select options, they should be asserted throughout the Reset period, to insure that the slowest 
co—processor in the system has sufficient time to lock the CPU clocks. 

3. Reset is acturally sampled in both Phase 1 and Phase 2. To insure proper initialization, it is recommended that Reset be negated relative to the end of 
Phase 1. 


Figure 19. Mode Vector Initialization 
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ORDERING INFORMATION 


DT —XSXXXX. — _XX_ 
Device Type Speed 


Package Process/ 
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Blank 
B 


M 


G 
F 


16 
20 
25 
33 


79R3000 


Commercial (0°C to +70°C) 
Military (-55°C to +125°C) 

Compliant to MIL-STD-883, Class B 
Military Temperature Range Only 


144-Pin PGA 
172-Pin Flat Pack 


16.67 MHz 
20.0 MHz 
25.0 MHz 
33.33 MHz 


RISC CPU Processor 


RISC CPU PROCESSOR 


Integrated Device Technology, Inc. 





FEATURES: 


e Enhanced instruction set compatible version of the 
IDT79R2000, IDT7SR3000 RISC CPUs. 

e Upwardly pin-compatible with IDT79R3000 RISC CPU. 

e¢ IDT79R3000A ”E’ version relaxes system memory timing 
requirements. 

e Full 32-bit Operation—Thirty-two 32-bit registers and all 
instructions and addresses are 32-bit. 

e Efficient Pipelining—The CPU's 5-stage pipeline design 
assists in obtaining an execution rate approaching one 
instruction per cycle. Pipeline stalls and exceptions are 
handled precisely and efficiently. 

e¢ On-Chip Cache Control—The IDT79R3000 provides a high 
bandwidth memory interface that handles separate external 
Instruction and Data Caches ranging in size from 4 to 
256 Kbytes each. Both the caches are accessed during a 
single CPU cycle. All cache control is on-chip. 

e On-Chip Memory Management Unit—A fully-associative, 64 
entry Translation Lookaside Buffer (TLB) provides fast 
address translation for virtual-to-physical memory mapping of 
the 4 Gigabyte virtual address space. 

e Coprocessor Interface—The IDT79R3000 generates all 
addresses and handles memory interface control for up to 
three additional tightly coupled external processors. 

e Optimizing Compilers are available for C, Fortran, Pascal, 
COBOL, Ada, and PL/1. 

e UNIX™ System V.3 and BSD 4.3 operating systems 
supported. 

e High-speed CEMOS™ technology. 

e Instruction set compatible with the IDT79R2000 RISC CPU. 

e¢ 16.7MHz, 20MHz, 25MHz and 33MHz clock rates yield up to 
28 MIPS sustained throughput. 


PRELIMINARY 
IDT79R3000A 
IDT79R3000AE 


e Supports independent multiword block refill of both the 
instruction and data caches with variable block sizes. 

e Supports concurrent refill and execution of instructions. 

e Partial word stores executed as read-modify-write operations. 

e 6 external interrupt inputs (up to 64 different sources), 2 
software interrupts, with single cycle latency to exception 
handler routine. 

e Flexible multiprocessing support on chip with no impact on 
uniprocessor designs. 

e Military product compliant to MIL-STD-883, Class B. 


DESCRIPTION: 


The IDT 79R3000A RISC Microprocessor consists of two tightly- 
coupled processors integrated on a single chip. The first processor 
is a full 32-bit CPU based on RISC (Reduced Instruction Set Com- 
puter) principles to achieve a new standard of microprocessor per- 
formance. The second processor is a system control coprocessor, 
called CPO, containing a fully-associative 64 entry TLB (Transla- 
tion Lookaside Buffer), MMU (Memory Management Unit) and 
control registers, supporting a 4 Gigabyte virtual memory subsys- 
tem, and a Harvard Architecture Cache Controller achieving a 
bandwidth of over 260 Mbytes/second using industry standard 
static RAMs. 

This data sheet provides an overview of the features and archi- 
tecture of the 79R3000A CPU, Revision 3.0. A more detailed de- 
scription of the operation of the device is incorporated in the 
“R3000A Family Hardware User Manual”, and a more detailed ar- 
chitectural overview is provided in the “mips RISC Architecture” 
book, both available from IDT. Documentation providing details of 
the software and development environments supporting this 
processor are also available from IDT. 
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IDT79R3000A CPU Registers 

The IDT79R3000A CPU provides 32 general purpose 32-bit reg- 
isters, a 32-bit Program Counter, and two 32-bit registers that hold 
the results of integer multiply and divide operations. Only two of the 
32 general registers have a special purpose: register r0 is hard- 
wired to the value “0”, which is a useful constant, and register r31 is 
used as the link register in jump-and-link instructions (return ad- 
dress for subroutine calls). 

The CPU registers are shown in Figure 2. Note that there is no 
Program Status Word (PSW) register shown in this figure: the 
functions traditionally provided by a PSW register are instead 
provided in the Status:and Cause registers incorporated within the 
System Control Coprocessor (CPO). 


General Purpose Registers 


Multiply / Divide Registers 


31 0 
31 0 


Program Counter 
: 31 


io) 


PC 





Figure 2. IDT79R3000A CPU Registers 


instruction Set Overview 

‘All IDT79R3000A instructions are 32 bits long, and there are only 
three instruction formats. This approach simplifies instruction de- 
coding thus minimizing instruction execution time. The 79R3000A 
processor initiates a new instruction on every run cycle, and is able 
to complete an instruction on almost every clock cycle. The only 
exceptions are the Load instructions and Branch instructions, 
which each have a single cycle of latency associated with their 
execution. Note, however, that in the majority of cases the compil- 
ers are able to fill these latency cycles with useful instructions 
which do not require the result of the previous instruction. This ef- 
fectively eliminates these latency effects. 

The actual! instruction set of the CPU was determined after ex- 
tensive simulations to determine which instructions should be im- 
plemented in hardware, and which operations are best synthe- 
sized in software from other basic instructions. This methodology 
resulted in the R3000A having the highest performance of any 
available microprocessor. 


Il-Type (Immediate) 
31 2625 2120 1615 0 


| op | rs | nt | immediate | 


J-Type (Jump) 
31 26 25 0 
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| op | target 


R-Type (Register) 
31 2625 2120 1615 1110 65 0 
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Figure 3. IDT79R3000A Instruction Formats 
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The IDT79R3000A instruction set can be divided into the follow- 
ing groups: 

e Load/Store instructions move data between memory and 

general registers. They are all I-type instructions, since the only 
addressing mode supported is base register plus 16-bit, signed 
immediate offset. 
The Load instruction has a single cycle of latency, which means 
that the data being loaded is not available to the instruction 
immediately after the load instruction. The compiler will fill this 
delay slot with either an instruction which is not dependent on 
the loaded data, or with a NOP instruction. There is no latency 
associated with the store instruction. 


Loads and Stores can be performed on byte, half-word, word, or 
unaligned word data (32 bit data not aligned on a modulo-4 
address). The CPU cache is constructed as a write-through 
cache. 


e¢ Computational instructions perform arithmetic, logical and 
shift operations on values in registers. They occur in both R-type 
(both operands and the result are registers) and I-type (one 
operand is a 16-bit immediate) formats. 


Note that computational instructions are three operand 
instructions; that is, the result of the operation can be stored into 
a different register than either of the two operands. This means 
that operands need not be overwritten by arithmetic operations. 
This results in a more efficient use of the large register set. 

e Jump and Branch instructions change the control flow of a 
program. Jumps are always to a paged absolute address 
formed by combining a 26-bit target with four bits of the Program 
counter (J-type format, for subroutine calls), or 32-bit register 
byte addresses (R-type, for returns and dispatches). Branches 
have 16-bit offsets relative to the program counter (I-type). 
Jump and Link instructions save a return address in Register 
31. The 79R3000A instruction set features a number of branch 
conditions. Included is the ability to compare a register to zero 
and branch, and also the ability to branch based on a 
comparison between two registers. Thus, net performance is 
increased since software does not have to perform arithmetic 
instructions prior to the branch to set up the branch conditions. 


instructions perform operations in the 
coprocessors. Coprocessor Loads and Stores are I-type. 
Coprocessor computational! instructions have coprocessor- 
dependent formats (see coprocessor manuals). 


e Coprocessor 0 instructions perform operations on the System 


Control Coprocessor (CPO) registers to manipulate the memory 
management and exception handling facilities of the processor. 


e Special instructions perform a variety of tasks, including 


movement of data between special and general registers, 
system calls, and breakpoint. They are always R-type. 


Table 1 lists the instruction set of the IDT79R3000A processor. 
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Load/Store Instructions 


Load Byte 

Load Byte Unsigned 
Load Halfword 

Load Halfword Unsigned 
Load Word 

Load Word Left 

Load Word Right 


Store Byte 

Store Halfword 
Store Word 

Store Word Left 
Store Word Right 


Arithmetic Instructions 
(ALU Immediate) 


Add Immediate 

Add Immediate Unsigned 
Set on Less Than Immediate 
Set on Less Than Immediate 
Unsigned 

AND Immediate 

OR Immediate 

Exclusive OR Immediate 


BLTZAL 
BGEZAL 


Load Upper Immediate 


Arithmetic Instructions 
(3—operand, register—-type) 
Add 

Add Unsigned 


Subtract 
Subtract Unsigned 


Set on Less Than 
Set on Less Than Unsigned 


SYSCALL 
BREAK 


LWCz 
SWCz 
MTCz 
MFCz 
CTCz 
CFCz 
COPz 
BCzT 
BCzF 


Exclusive OR 
NOR 


Shift Instructions 


Shift Left Logical 

Shift Right Logical 

Shift Right Arithmetic 

Shift Left Logical Variable 
Shift Right Logical Variable 
Shift Right Arithmetic Variable 
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DESCRIPTION 


Multiply/Divide Instructions 


Multiply 

Multiply Unsigned 
Divide 

Divide Unsigned 
Move From HI 
Move To HI 

Move From LO 
Move To LO 


Jump and Branch Instructions 


Jump 

Jump and Link 

Jump to Register 

Jump and Link Register 


Branch on Equal 

Branch on Not Equal 

Branch on Less than or Equal to Zero 
Branch on Greater Than Zero 

Branch on Less Than Zero 

Branch on Greater than or 

Equal to Zero 

Branch on Less Than Zero and Link 


Branch on Greater than or Equal to 
Zero and Link 


Special Instructions 


System Call 
Break 


Coprocessor Instructions 

Load Word from Coprocessor 

Store Word to Coprocessor 

Move To Coprocessor 

Move From Coprocessor 

Move Control to Coprocessor 

Move Control From Coprocessor 
' Coprocessor Operation 

Branch on Coprocessor z True 

Branch on Coprocessor z False 


System Control Coprocessor 
(CPO) Instructions 


Move To CPO 
Move From CPO 


Read indexed TLB entry 
Write Indexed TLB entry 
Write Random TLB entry 
Probe TLB for matching entry 


Restore From Exception 


Table 1. IDT79R3000A Instruction Summary 
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IDT79R 


3000A System Control Coprocessor (CPO) and supports the virtual memory system and exception handling 


The IDT79R3000A can operate with up to four tightly-coupled functions of the IDT79R3000A. The virtual memory system is im- 
coprocessors (designated CPO through CP3). The System Control plemented using a Translation Lookaside Buffer and a group of 
Coprocessor (or CPO), is incorporated on the IDT79R3000 chip programmable registers as shown in Figure 4. 


System Coprocessor | 4 


Rom oes 
a ee 
vy 


Coan ER Ca 
Rare 
av 


ENTRYHI ENTRYLO 











63 
RANDOM 
TLB 

8 

NOT ACCESSED BY RANDOM 
[__] Used with Virtual Memory System 
b Used with Exception Processing 

Figure 4. The System Coprocessor Reglsters 
System 


ieee aes es | DESCRIPTION, 
The CPO registers shown in Figure 4 are used to control the REGISTER DeSenipror 


memory management and exception handling capabilities of the EntryHi {High half of a TLB entry 


IDT79R3000A. Table 2 provides a brief description of each 


register. 


EntryLo Low half of a TLB entry 
Index Programmable pointer into TLB array 
Random |Pseudo-random pointer into TLB array 


Status Mode, interrupt enables, and diagnostic status info 
Cause Indicates nature of last exception 

EPC Exception Program Counter 

Context [Pointer into kernel’s virtual Page Table Entry array 
BadVA {Most recent bad virtual address 


PRid Processor revision identification (Read only) 





Table 2. System Control Coprocessor (CPO) Registers 
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Memory Management System 

The IDT79R3000A has an addressing range of 4 Gbytes. 
However, since most IDT79R3000A systems implement a physi- 
cal memory smaller than 4 Gbytes, the IDT79R3000A provides for 
the logical expansion of memory space by translating addresses 
composed in a large virtual address space into available physical 
memory address. The 4 GByte address space is divided into 
2 GBytes which can be accessed by both the users and the kernel, 
and 2 GBytes for the kernel! only. 


The TLB (Translation Lookaside Buffer) 

Virtual memory mapping is assisted by the Translation 
Lookaside Buffer (TLB). The on-chip TLB provides very fast virtual 
memory access and is well-matched to the requirements of multi- 
tasking operating systems. The fully-associative TLB contains 
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64 entries, each of which maps a 4-Kbyte page, with controls for 
read/write access, cacheability, and process identification. The 
TLB allows each user to access up to 2 Gbytes of virtual address 
space. 

Figure 5 illustrates the format of each TLB entry. The Translation 
operation involves matching the current Process ID (PID) and up- 
per 20 bits of the address against PID and VPN (Virtual Page Num- 
ber) fields in the TLB. When both match (or the TLB entry is 
Global), the VPN is replaced with the PFN (Physical Frame Num- 
ber) to form the physical address. 

TLB misses are handled in software, with the entry to be re- 
placed determined by a simple RANDOM function. The routine to 
process a TLB miss in the UNIX environment requires only 10-12 
cycles, which compares favorably with many CPUs which perform 
the operation in hardware. 


TLB ENTRY FORMAT 


38 37 





ENTRYHI 


32 31 


10 9 8 7 


ENTRYLO 


VPN — Virtual Page number 
TLBPID — Process ID 
PFN — Physical frame number 


N - Non-cacheable flag 

D — Dirty flag (Write protect) 
V —- Valid entry flag 

G — Global flag (ignore PID ) 
O — Reserved 


Figure 5. TLB Entry Format 


IDT79R3000 Operating Modes 

The IDT79R3000A has two operating modes: User mode and 
Kernelmode. The IDT79R3000A normally operates in the User 
mode until an exception is detected forcing it into the Kernel 
mode. It remains in the Kernel mode until a Restore From Excep- 
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tion (RFE) instruction is executed. The manner in which memory 
addresses are translated or mapped depends on the operating 
mode of the IDT79R3000A. Figure 6 shows the MMU translation 
performed for each of the operating modes. 
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MMU ADDRESS TRANSLATION 
VIRTUAL —> PHYSICAL 


OxFFFFFFEFF 


KERNEL 
MAPPED 
CACHEABLE 
(kseg2) 


0xC0000000 
KERNEL 
UNMAPPED 
UNCACHED 
(kseg1) 


KERNEL 
UNMAPPED 
CACHED 
ksegO 


0xA0000000 


0x80000000 
Ox7FFFFFFF KERNEL/USER 
MAPPED 


CACHEABLE 


(kuseg) 





OxFFFFFFFF 


PHYSICAL 
MEMORY 


3584 MB 


0x20000000 


Ox1FFFFFFF 
MEMORY 


0x00000000 


512 MB 





Figure 6. IDT79R3000A Virtual Address Mapping 


User Mode—in this mode, a single, uniform virtual address 
space (kuseg) of 2 Gbyte is available. Each virtual address is ex- 
tended with a 6-bit process identifier field to form unique virtual ad- 
dresses. All references to this segment are mapped through the 
TLB. Use of the cache for up to 64 processes is determined by bit 
settings for each page within the TLB entries. 

Kernel Mode—four separate segments are defined in this 
mode: 


e kuseg—when in the kernel mode, references to this segment 
are treated just like user mode references, thus streamlining 
kernel access to user data. 


e ksegO—references to this 512 Mbyte segment use cache 
memory but are not mapped through the TLB. Instead, they 
always map to the first 0.5 GBytes of physical address space. 

e kseg1!—references to this 512 Mbyte segment are not mapped 
through the TLB and do not use the cache. Instead, they are 
hard-mapped into the same 0.5 GByte segment of physical 
address space as kseg0. 

e kseg2—teferences to this 1 Gbyte segment are always mapped 
through the TLB and use of the cache is determined by bit 
settings within the TLB entries. 


IDT79R3000 Pipeline Architecture 
The execution of a single IDT79R3000A instruction consists of 
five primary steps: 
1) 1F |©—Fetch the instruction (I-Cache). 
2) RD —Read any required operands from CPU registers 
while decoding the instruction. 
3) ALU — Perform the required operation on instruction 
operands. 
4) MEM— Access memory (D-Cache). 
5) WB —Write back results to register file. 
Each of these steps requires approximately one CPU cycle as 
shown in Figure 7 (parts of some operations overlap into another 
cycle while other operations require only 1/2 cycle). 


Instruction Execution 
IF R 









WB 


| op __|D-cACHE| we | 


D 
RF 








a 4 


one cycle 


Figure 7. IDT79R3000A Instruction Pipeline 
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The IDT79R3000A uses a 5-stage pipeline to achieve an 
instruction execution rate approaching one instruction per CPU cy- 
cle. Thus, execution of five instructions at atime are overlapped as 
shown in Figure 8. 


IDT79R3000A Instruction Pipeline 
(5—-deep) 


iF [AD [ALOT MEM] we | 
TF [Rb [ALU | Mem] We] 

TF RD | ALU [MEM] We] 
a (IF ] Ro [ALU [MEM] We] 








Instruction : 
Flow [IF { RO [ALU [| MEM] WB | 
Current 
CPU 
Cycle 


Figure 8. IDT79R3000A Execution Sequence 


This pipeline operates efficiently because different CPU re- 
sources (address and data bus accesses, ALU operations, regis- 
ter accesses, and so on) are utilized on a non-interfering basis. 


Memory System Hierarchy 

The high performance capabilities of the IDT79R3000A proces- 
sor demand system configurations incorporating techniques fre- 
quently employed in large, mainframe computers but seldom en- 
countered in systems based on more traditional microprocessors. 

Aprimary goal of systems employing RISC techniques is to mini- 
mize the average number of cycles each instruction requires for 
execution. In orderto achieve this goal, RISC processors incorpo- 
rate a number of RISC techniques including a compact and uni- 
form instruction set, a deep instruction pipeline (as described 
above), and utilization of optimizing compilers. Many of the advan- 
tages obtained from these techniques can, however, be ne- 
gated by an inefficient memory system. 

Figure 9 illustrates memory in a simple microprocessor system. 
In this system, the CPU outputs addresses to memory and reads 
instructions and data from memory or writes data to memory. The 
address space is completely undifferentiated: instructions, data, 
and I/O devices are all treated the same. In such a system, a pri- 
mary limiting performance factor is memory bandwidth. 









ene ee es! 
(CPU) 


Data Address 






M 
(and V2) 





Figure 9. A Simple Microprocessor Memory System 


Figure 10 illustrates a memory system that supports the signifi- 
cantly greater memory bandwidth required to take full advan- 
tage of the IDT79R3000A’s performance capabilities. The key fea- 
tures of this system are: 
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e External Cache Memory—Local, high-speed memory (called 
cache memory) is used to hold instructions and data that is 
repetitively accessed by the CPU (for example, within a 
program /oop) and thus reduces the number of references that 
must be made to the slower-speed main memory. Some 
microprocessors provide a limited amount of cache memory on 
the CPU chip itself. The external caches supported by the 
IDT79R3000A can be much larger; while a small cache can 
improve performance of some programs, _ significant 
improvements for a wide range of programs require large 
caches. 

e Separate Caches for data and instructlons—Even with 

high-speed caches, memory speed can still be a limiting factor 
because of the fast cycle time of a high-performance 
microprocessor. The IDT79R3000A supports separate caches 
for instructions and data and alternates accesses of the two 
caches during each CPU cycle. Thus, the processor can obtain 
data and instructions at the cycle rate of the CPU using caches 
constructed with commercially available IDT static RAM 
devices. 
In order to maximize bandwidth in the cache while minimizing 
the requirement for SRAM access speed, the R3000A divides a 
single-processor clock cycle into two phases. During one 
phase, the address for the datacache access is presented while 
data previously addressed in the instruction cache is read; 
during the next phase, the data operation is completed while the 
instruction cache is being addressed. Thus, both caches are 
read in a single processor cycle using only one set of address 
and data pins. 

e Write Buffer—tin order to ensure data consistency, all data that 
is written to the data cache must also be written out to main 
memory. The cache write model used by the IDT79R3000A is 
that of a write-through cache; that is, all data written by the CPU 
is immediately written into the mainmemory. To relieve the CPU 
of this responsibility (and the inherent performance burden) the 
IDT79R3000A supports an interface to a write buffer. The 
IDT79R3020 Write Buffer captures data (and associated 
addresses) output by the CPU and ensures that the data is 
passed on to main memory. 










IDT79R3000A 
Microprocessor 


Instruction 
Cache 
Write 
Buffer 






Main Memory 


Figure 10. An IDT79R3000A System with a 
High-Performance Memory System 
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IDT79R3000A Processor Subsystem Interfaces 


Figure 11 illustrates the three subsystem interfaces provided by 
the IDT79R3000A processor: 


e Cache control interface (on-chip) for separate data and 
instruction caches permits implementation of off-chip caches 
using standard IDT SRAM devices. The 79R3000A directly 
controls the cache memory with a minimum of external 
components. Both the instruction and data cache can vary from 
0 to 256K Bytes (64 K entries). The 79R3000A also includes the 
TAG control logic which determines whether or not the entry 
read from the cache is the desired data. 


The 79R3000A cache controller implements a direct mapped 
cache for high net performance (bandwidth). It has the ability to 
refill multiple words when a cache miss occurs, thus reducing 
the effective miss rate to lass than 2% for large caches. Whena 
cache miss occurs, the 79R3000A can support refilling the 
cache in 1, 4, 8, 16, or 32 word blocks to minimize the effective 
penalty of having to access main memory. The 79R3000A also 
incorporates the ability to perform instruction streaming; while 
the cache is refilling, the processor can resume execution once 
the missed word is obtained from main memory. !n this way, the 
processor can continue to execute concurrently with the cache 
block refill. 


¢ Memory controller interface for system (main) memory. This 
interface also includes the logic and signals to allow operation 
with a write buffer to further improve memory bandwidth. In 
addition to the standard full word access, the memory controller 
supports the ability to write bytes and half-words by using partial 
word operations. The memory controller also supports the 
ability to retry memory accesses if, for example, the data 
returned from memory is invalid and a bus error needs to be 
signalled. 


e Coprocessor Interface—The IDT79R3000A features a tightly 
coupled co-processor interface in which all co-processors 
maintain synchronization with the main processor; reside on the 
same data bus as the main processor; and participate in bus 
transactions in an identical manner to the main processor. The 
IDT79R3000A generates all required cache and memory 
control signals, including cache and memory addresses for 
attached coprocessors. As a result, only the data bus and a few 
control signals need to be connected to a coprocessor. 


The interface supports three types of coprocessor instructions: 
loads/stores, coprocessor operations, and  processor- 
coprocessor transfers. Note that coprocessor loads and stores 
occur directly between the coprocessor and memory, without 
requiring the data to go through the CPU. 


Synchronization between the CPU and external coprocessors 
is achieved using a Phased-Lock Loop interface to the 
coprocessor. The coprocessor physical interface also includes 
coprocessor condition signals (CpCond(n)), which are used in 
coprocessor branch instructions, and acoprocessor busy signal 
(CpBusy) which is used to stall the CPU if the coprocessor 
needs to hold off subsequent operations. 


Finally, a precise exception interface is defined between the 
CPU and coprocessors using the external interrupt inputs of the 
CPU. This allows a coprocessor exception, even if it was the 
result of a multi-cycle operation, to be traced to the precise 
coprocessor operation which caused it. This is an important 
feature for languages which can define specific error handlers 
for each task. 


The interface supports up to four separate coprocessors. 
Coprocessor 0 is defined to be the system control coprocessor, 
and resides on the same chip as the CPU unit. Coprocessor 1 is 
the Floating Point Accelerator, IDT 79R3010A. Coprocessors 2 
and 3 are available to support an interface to application specific 
functions. 
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MULTIPROCESSING SUPPORT 


The IDT79R3000A supports multiprocessing applications in a 
simple but effective way. Multiprocessing applications require 
cache coherency across the multiple processors. The 
IDT79R3000A offers two signals to support cache coherency: the 
first, MPStall, stalls the processor within two cycles of being re- 
ceived and keeps it from accessing the cache. This allows an ex- 
ternal agent to snoop into the processor data cache. The second 
signal, MPInvalidate, causes the processor to write data on the 
data cache bus which indicates the externally addressed cache 
entry is invalid. Thus, a subsequent access to that location would 
result in acache miss, and the data would be obtained from main 
memory. 

The two MP signals would be generated by a external logic which 
utilizes a secondary cache to perform bus snooping functions. The 
79R3000A does not impose an architecture for this secondary 
cache, but rather is flexible enough to support a variety of applica- 
tion specific architectures and still maintain cache coherency. Fur- 
ther, there is no impact on designs which do not require this fea- 
ture. Further, the 79R3000A has improved on the microprocessor 
support found in the 79R3000, by allowing the use of cache RAMs 
with internal address latches in multiprocessor systems. 


ADVANCED FEATURES 


The IDT79R3000A offers a number of additional features such 
as the ability to swap the instruction and data caches, facilitating 
diagnostics and cache flushing. Another feature isolates the 
caches, which forces cache hits to occur regardless of the contents 
of the tag fields. The IDT79R3000A allows the processor to exe- 
cute user tasks of the opposite byte ordering (endianness) of the 
operating system, and further allows parity checking to be dis- 
abled. More details on these features can be found in the IDT 
79R3000A Family Hardware User’s Manual. 

Further features of the IDT79R3000A are configured during the 
last four cycles prior to the negation of the RESET input. These 
functions include the ability to select cache sizes and cache refill 
block sizes; the ability to utilize the multiprocessor interface; 
whether or not instruction streaming is enabled; whether byte or- 
dering follows “Big-Endian” or “Little-Endian’” protocols, etc. Table 
3 shows the configuration options selected at Reset. These are fur- 
ther discussed in the “Hardware User’s Manual”. 


BACKWARD COMPATIBILITY WITH 79R2000 


The IDT79R3000A can be used in sockets designed for the 
79R3000A. The pin-out of the 79R3000A has been selected to en- 
sure this compatibility, with new functions mapped onto previously 
unused pins. The instruction set is compatible with that of the 
79R2000 at the binary level. As a result, code written for the older 
processor can be executed. New features can be selectively 
disabled. 

In most 79R3000A applications, the 79R3000A can be placed in 
the socket with no modification to initialization settings. Further ap- 
plication assistance on this topic is available from IDT. 


PACKAGE THERMAL SPECIFICATIONS 


The IDT79R3000 utilizes special packaging techniques to im- 
prove both the thermal and electrical characteristics of the 
microprocessor. 

In order to improve the electrical characteristics of the device, the 
package is constructed using multiple signal planes, including indi- 
vidual power planes and ground planes to reduce noise associ- 
ated with high-frequency TTL parts. In addition, the 175-pin PGA 
package utilizes extra power and ground pins to reduce tve induc- 
tance from the internal power planes to the power planes of the PC 
Board. 
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In order to improve the electrical characteristics of the micropro- 
cessor, the device is housed using cavity down packaging. In addi- 
tion, these packages incorporate a copper-tungsten thermal slug 
designed to efficiently transfer heat from the die to the case of the 
package, and thus effectively lower the thermal resistance of the 
package. The use of an additional external heat sink affixed to the 
package thermal slug further decreases the effective thermal re- 
sistance of the package. 

The case temperature may be measured in any environment to 
determine whether the device is within the specified operating 
range. The case temperature should be measured at the center of 
the top surface opposite the package cavity (the package cavity is 
the side where the package lid is mounted). 

The equivalent allowable ambient temperature, TA, can be cal- 
culated using the thermal resistance from case to ambient (Oca) for 
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the given package. The following equation relates ambient and 
case temperature: 
TA = Tc —-P*@ca 
where P is the maximum power consumption, calculated by using 
the maximum Icc from the DC Electrical Characteristics section. 
Typical values for @ca at various airflows are shown in table 3 for 
the various CPU packages. 






@ca (175-PGA, 
144—-PGA 


@ca (172 Quad 
Flatpack) 


Table 3. Thermal Resistance (@ca) at Various Alrflows 
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INPUT W CYCLE X CYCLE Y CYCLE Z CYCLE 


TntoO DBIkSize0 DBikSize1 Extend Cache BigEndian 














































Tntt TBIkSizeo TBikSizet MPAdrDisable TriState 
Tht2 DispPar/RevEnd iStream IgnoreParity NoCache 
nts Reserved(!) StorePartial MultiProcessor BusDriveOn 
Tnt4 PhaseDelayOn(?) PhaseDelayOn(2) PhaseDelayOn() PhaseDelayOn() 
Tats R3000 Model?) R3000 Model) R3000 Model) F3000 Model2) 


NOTES: 
1. Reserved entries must be driven high. 
2. These values must be driven stable throughout the entire RESET period. 


Table 3: IDT79R3000A Mode Selectable Features 


AdrLo Bus 





Tag AdrLo Data 
TagV DataP 
TagP 


IClk DClk 

























Data Tag _lAdr 
[15:2] IDT79R3000A Processor 
with System Control 
Coprocessor 





Instruction 
Cache = 


O 
m 


DRd 






Cik2xSys 














1 
Clk2xSmp 

XEn Cik2xRd — me 

SysOut Clk2xPhi 

AccTy[2:0] Reset 

MemRd CpSyne Coprocessors 
Memory ants Fun 

RdBusy Exc 

WrBusy CpBusy 

CpCondlo] — Epcond[3:1] 

BusError Tntr(5:0) 





Figure 11. IDT79R3000A Subsystem Interfaces Example; 64 KB Caches 
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PIN CONFIGURATION 














Data21 AdrLo2 
Data22 AdrLo3 
rere AdrLo4 
ata AdrLo5 
Data26 AdrLo6 
Data31 AdrLo7 
DataP3 AdrLo8 
Data27 AdrLo9 
Data28 AdrLo10 
XEn AdrLo11 
Data29 AdrLo12 
Data30. AdrLo13 
Exc AdrLoi4 
Clk2xPhi VCC 
he VCC 
vcc 
Clk2xSmp GND 
VCC GND 
VCC vcc 
GND VCC 
GND GND 
GND VCC 
Vcc Vcc 
Vcc Vcc 
Vcc AdrLo15 
GND CpCondo 
GND CpCond1 
Clk2xSys_ Resvd1(1) 
mah ep 
Rd 1 G 
OWT AdtLot6® 
AdrLo17 
vec Co Into 
VCC = Int? 
SysOut (7 {ate 
Dclk Co Ina 
is an IntS 
_IRd2 C4 CpBusy 
DRd2 CH WrBusy 
Wr2 RdBusy 
r BusError 
MemWr CJ 1 129 [J Reset 
13 
172-PIN CERAMIC FLATPACK 
(Top View) 
NOTES: 


1. Reserved pins must not be connected. 


2. AdrLo 16 & 17 are multi-function pins which are controlled by mode select programming on interrupt pins at reset time. 
AdrLo 16: MP Invalidate, CpCond (2). 
AdrLo 17: MP Stall, CpCond (3). 


SS a 
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PIN CONFIGURATION 



























1 2 4 6 7 8 9 10 11 12 13 14 15 
a | voc | AdtLo| Adio AdiLo| yog | AdrLo AdiLo ]opCond AdrLo/AdrLo™| Intra Wr_ | Reset | vcc 
14 14 0 16 17 Busy 
B oare drLo | AdrLo | AdrLo] IRd2 | AdrLo |CpCond Intr3 | Cp Bus Tag12 | Tagi5 
oes 9 12 13 1 Busy | Error 
c ee fe AdrLo ea GND | GND | VCC | Intro | Intr4 Rd Tag13 | TagP0 | Tag18 
5 Busy 
0 0 
7 
G | Data | Data | GND GND | Tag22 | TagP1 
4 3 
6 
J | Data | DataP | Data Tag28 | Tag29 | Tag26 
10 1 9 
11 
L | VCC | Data | Data Acc | Tag31 | Tag30 
12 17 Typ2 
M oe ae aa GND | Acc | VCC 
Typ1 
N | Data | Data | Data | GND a bas vec | VCC } GND | GND Mem | Mem TagV 
14 18 19 r | Rd 
P Data | TWr2 | Data | Data XEn | Data | Clk2x | Clk2x | DCik | IRd1 | Wri Acc 
20 22 26 
31 
144-Pin PGA (Top View) 
NOTE: 


1. Adrlo 16 & 17 are multi-function pins which are controlled by mode select programming on interrupt pins at reset time. 
AdrLo16: MP Invalidate, CpCond (2). 
AdrLo17: MP Stall, CpCond (3). 
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9 10 11 


1 2 3 4 5 6 7 8 12 13 14 15 
AdrLo| AdrLo| AdrLo| yoo | AdrLo| AdrLo CpCondjAdrLo{AdrLo"| Thtrd | intrS | _Wr_ | Reset | vcc 
6 10 11 14 15 0 16 17 Busy 





























A 
p | AdrLo | DRd2 | AdrLo | AdrLo | AdrLo AdrLo |CpCond Intr1 | Intr3 | Cp Bus Tag12 | Tagi5 
3 7 9 12 13 1 Busy | Error 
c | AdrLo | AdrLo | VCC | AdrLo | AdrLo} GND | GND | VCC | Intro | Intr4 Rd GND | Tag13 | TagPo | Tag18 
0 4 5 8 Busy 
1 2 
f= | DataP | Data | AdrLo|] yoo vcc | Tag16 | Tag20} VCC 
0 0 1 
7 
4 3 
6 8 
10 | 9 
K Data | GND | GND GND | GND | Tag | Tag27 
11 ke 
L | VCC } Data | Data | vCc vcc | Acc | Tag31 | Tag30 
12 17 Typ2 
M Data | DataP | GND | VCC | GND | VCC | GND | VCC } GND | VCC | GND | GND | Acc | VCC 
16 2 Typ1 
N | Data | Data } Data | GND | Data | Data | VCC | VCC } GND | GND Mem TagV 
14 | 18 | 19 24 | P3 Wr 
P Data | IWr2 | Data | Data | Data | XEn | Data | Cik2x | Cik2x | DClk } IRdt | TWrt | Cp | Acc 
20 22 26 27 30 Sys Rd Sync | TypO 
Q | VCC | Data Data {| Data | GND | Data | Excep | Cik2x | Clk2x {SysOut}] VCC | IClk | DWri | VCC 
21 31 28 29 tion Phi Smp 
175-Pin PGA (Top View) 
NOTE: 


1. AdrlLo 16 & 17 are multi-function pins which are controlled by mode select programming on interrupt pins at reset time. 
AdrLo16: MP invalidate, CpCond (2). 
AdrLo17: MP Stall, CpCond (3). 
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PIN DESCRIPTIONS 


DESCRIPTION 


A 32-bit bus used for all instruction and data transmission among the processor, caches, memory interface, and 
coprocessors. 


Data (0-31) Me) 






VO 
vO 


DataP (0-3) A 4-bit bus containing even parity over the data bus. 





A 20-bit bus used for transferring cache tags and high addresses between the processor, caches, and memory interface. 
The tag validity indicator. 
VO A 3-bit bus containing even parity over the concatenation of TagV and Tag. 


TagV 





An 18-bit bus containing byte addresses used for transferring low addresses from the processor to the caches and memory 


17) 





interface. (AdrLo 16: CpCond (2), AdrLo 17: CpCond (3) set by reset initialization). 
Read enable for the instruction cache. 

Write enable for the instruction cache. 

An identical copy of IRdT used to split the load. 

An identical copy of Wri used to split the load. 


alalala a8 |3 \3 
Ml pol hy civ a 
of. —_ 
~_ NM 
o 
mee do 
2 eg 


The instruction cache address latch clock. This clock runs continuously. 
The read enable for the data cache. 


The write enable for the data cache. 

An identical copy of DRdT used to split the load. 

An identical copy of DWrT used to split the load. 

The data cache address latch clock. This clock runs continuously. 
The read enable for the Read Buffer. 


A 3-bit bus used to indicate the size of data being transferred on the data bus, whether or not a data transfer is 
occurring, and the purpose of the transfer. 


DCIk 





Signals the occurrence of a main memory write 
Signals the occurrence of a main memory read. 
r Signals the occurrence of a bus error during a main memory read or write. 
Indicates whether the processor is in the run or stall state. 


ception Indicates that the instruction about to commit state should be aborted and other exception related information. 


A reflection of the internal processor clock used to generate the system clock. 


psyn A clock which is identical to SysOut and used by coprocessors for timing synchronization with the CPU. 





The main memory read stall termination signal. In most system designs RdBusy is normally asserted andis deasserted only to 
indicate the successful completion of a memory read. RdBusy is sampled by the processor only during memory read stalls. 








8 


rBus The main memory write stall initiation/termination signal. 


QO 
ise) 
Cc 
Ga 


The coprocessor busy stall initiation/termination signal. 


A 2-bit bus used to transfer conditional branch status from the coprocessors to the main processor. 


1) 
CpCond (2-3) Conditional branch status from coprocessors to the processor. Function is provided on AdrLo 16/17 pins and is selected at 


reset time. 


MPStall Multiprocessing Stall. Signals to the processor that it should stall accesses to the caches in a multiprocessing environment. 


This is physically the same pin as CpCond3; its use is determined at RESET initialization. 


MPInvalidate Multiprocessing Invalidate. Signals to the processor that it should issue invalidate data on the cache data bus. The address to 


be invalidated is externally provided. This is the same pin as CpCond2; its use is determined at RESET initialization. 


A 6-bit bus used by the memory interface and coprocessors to signal maskable interrupts to the processor. At reset time, 
mode select values are read in. 


The master double frequency input clock used for generating SysOut. 


nt 






Clk2xSys 
Cik2xSmp 
Clk2xRd ti] A double frequency clock input used to determine the enable time of the cache RAMs. 


Cik2xPhi 





al 3 
fe} 
Ps 3 
be 


A double frequency clock input used to determine the sample point for data coming into the processor and coprocessors. 


A double frequency clock input used to determine the position of the internal phase1 and 


Synchronous initialization input used to force execution starting from the reset memory address. Reset must be deasserted 
synchronously but asserted asynchronously. The deassertion of reset must be synchronized by the leading edge of SysOut. 


set 
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ABSOLUTE MAXIMUM RATINGS": *) 


SYMBOL RATING COMMERCIAL | MILITARY |UNIT 
Terminal Voltage 
Vrenm [With Respectto | -0.5to+7.0 | -0.5to+7.0/ V 
GND 
Operating 0 to +70 -55 to+125 |, C 
Ta,Te Temperature (Ambient) (Case) 
ne [Ss [ss 
T Storage . 
STG Temperature(2) —-55 to +125 —65 to +150 | °C 


Input Voltage —0.5 to +7.0 —0.5 to +7.0 


NOTES: 

1. Stresses greater than those listed under ABSOLUTE MAXIMUM RAT- 
INGS may cause permanent damage to the device. This is a stress rating 
only and functional operation of the device at these or any other conditions 
above those indicated in the operational sections of this specification is not 
implied. Exposure to absolute maximum rating conditions for extended pe- 
riods may affect reliability. 


2. VIN minimum =-—3.0V for pulse width less than 15ns. 
Vin should not exceed Vcc +0.5 Volts. 


3. Not more than one output should be shorted at a time. Duration of the short 
should not exceed 30 seconds. 











AC TEST CONDITIONS 








SYMBOL 


an [wa 
[inn | tapuvicH votes |_20 | — | 
ae 


PARAMETER 
V 


aed 






Input HIGH Voltage 
Input LOW Voltage © 
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RECOMMENDED OPERATING 
TEMPERATURE AND SUPPLY VOLTAGE 


re -55°C to +125°C 5 
x 


Ambient) 
OUTPUT LOADING FOR AC TESTING 











To Device 
Under Test 
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DC ELECTRICAL CHARACTERISTICS— 
COMMERCIAL TEMPERATURE RANGE Ta = 0°C to +70°C, Vcc = +5.0V+5% 


16.67MHz 20.0MHz 
SYMBOL PARAMETER TEST CONDITIONS MIN. MAX.| MIN. MAX. 


on | Output HIGH Voltage | Voc = Min. low=-4maA | 35 — | 35 — | 
oL__| Output Low Voltage | Vec=Min..lor=4mA_| — 04 | — 04 | 
Voxc Vec=Min., loi=-4mA | 40 — | 40 — | 
| Output HIGH Voltage 48) | Voo=Min.,lon=-8mA | 2.4 — | 24 — | 
Vout 
r 
l 
l 
H 
L 












Vi 
V 


4 













fl 
4 


input LOW Votage 0 
Vis Input HIGH Voltage (2-5) ae 


3.5 

— 04 
0 

= 4 


V 
Vv 
Vits Input LOW Voltage (1: 2) 

C Input Capacitance (6) 

Ic 

il 

I 

lo. 


H 
L 
0 

N 


ld 


ut_| Output Capacitance) | 
7 Output Tri-state Leakage 
NOTES: 


1, Vit Min. =-3.0V for pulse width less than 15ns. ViL_ should not fall below —0.5 Volts for larger periods. 
2. Vins and Vits apply to Clk2xSys, Clk2xSmp, Clk2xRd, Clk2xPhi, CpBusy, and Reset. 

3. These parameters do not apply to the clock inputs. 
4 


. Vout and VoLT apply to the bidirectional data and tag busses only. Note that ViH and Vit also apply to these signals. VoOHT and VOLT are provided to give 
the designer further information about these specific signals. 


. ViH should not be held above Vcc + 0.5 volts. 
. Guaranteed by design. 


7. Voue applies to RUN and Exception. 


L [— 08 | — 03 | 
}30 —| 30  — | 

0 ieee 0 

c |— 450 | — 550 | 


40 40 |[-40 40 [-40 40 | pA | 





a o 
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DC ELECTRICAL CHARACTERISTICS— 
MILITARY TEMPERATURE RANGE (Tc =-55°C to +125°C, Vcc = +5.0V + 10%) 


16.67MHz 20.0MHz 
PARAMETER TEST CONDITIONS MIN. MAX. | MIN. MAX. 


H__| Output HIGH Voltage | Voc =Min. ton=-4maA_ | 35 — | 35 — | 
i__| Output LOW Voltage | Vec=Min. to=4ma | — 04 | — 04 
40 = 

HT_| Output HIGH Voltage +9) | Vec=Min. tow=-ama | 24 — | 24 
Vout 

H 
L 
IN 











UNIT 


P= oa | 


ie) 
< 
= 
ius] 
Oo 
re 


Vi 
Vi 
VoHcC 
Vv 
V 


da 


VIHS InputHIGH Voltage) fs 


O 
O 
{ 
VI 
Vits Input LOW Voltage (1. 2) 

C Input Capacitance (6) 

Ic 

li 

lit 









li 
je) 


_| output Capactance® | 
: 
NOTES: 


1. Vit Min. = -3.0V for pulse width less than 15ns. Vit should not fall below -0.5 Volts for larger periods. 
2. Vins and Vics apply to Clk2xSys, Clk2xSmp, Clk2xRd, Clk2xPhi, CpBusy, and Reset. 

3. These parameters do not apply to the clock inputs. 
4 


. VoHTand VotT apply to the bidirectional data and tag busses only. Note that ViH and ViL also apply to these signals. VoHT and VoLT are provided to give 
the designer further information about these specific signals. 


. ViH should not be held above Vcc + 0.5 volts. 
. Guaranteed by design. 


7. VoHc applies to RUN and Exception. 


2.4 _ 2.4 _ 
ee Tel 

eae se | as 

ee Ys ee ee a 
aaa eee aT 
ee ee 
ae et 

i aes 

-40 40 —40 40 0 





an 


41 








IDT79R3000A/AE RISC CPU PROCESSOR MILITARY AND COMMERCIAL TEMPERATURE RANGES 
AY EE PT EOE I) RE I I EE BAA TS A IIT Se TEE ES 0 AT PY ET 


AC ELECTRICAL CHARACTERISTICS FOR IDT79R3000A" 2) — 
COMMERCIAL TEMPERATURE RANGE (Ta =0°C to +70°C, Voc = +5.0V + 5%) 


16.67MHz ] 20.0MHz 
SYMBOL PARAMETER TEST CONDITIONS| ,16.57MH2 | 20.0MHz Rte nice Wie ae UNIT 


Input Clock High’) | Note7 of 125 — | to — | 8 — | 6 —] os 
Input Clock Low® ee Eel Oe el ee 
30 500 















Tcokp Input Clock Period(?) 20 500 
Clk2xSys to Clk2xSmp(®) 0 teyc/4 
Cik2xSmp to Clk2xRd(®) 0 teyc/4 

Phi(®) teyc/4 








Run Operation 


ToEn 
Topis 
me [oad = 25pF 
Two Load = 25pF 
Tos 
Ton 
Tcas 
rear 
Tact Load = 25pF 
Ta Load = 259F 
Tuwr Load = 25pF 
TExe Exception Load = 25pF 
Taval Load = 25pF 
= 

ia 
Stall Operation 

Tait (oad = 25pF 
Tsact Load = 25pF 
Turd Load = 25pF 
Turan Load = 25pF 
Tse Load = 25pF 
Trun Load = 25pF 
Tsmwe Load = 25pF 


Tsexc Exception Valid Load = 25pF 
Reset Initialization 


Tarst Reset Pulse Width 
TrstPLL_ | Reset timing, Phase—lock on(4: 5) 





_ 
ie) 
| 
: 
| 


| 
; 


| 
—_ 
~™~S 
| 
et 
Lb 
| 
; 
ine) 


1 
~{s 





| } — 23 | — 19 | — 135) ns | 
J 113.5] ns | 


1 1 23 


mfr ][rmo]o 
NININ|O 


] jo 
“IS 
w 
bh 
a 
| 
; 


wo 


| 
It 
On 
| 
—_ 
wo 
| 
st 
(=) 
| 
N 
} 






pe —|]6 —|] 6 —| to 
: 3000 _— | 3000 — | 3000 — | 3000 — | Toyc| 
Trstep___[ Reset timing, Phase—lock off(* 5) 12a — [128 — | ze — | 128 — | Toye 
Capacitive Load Deration 


cud [toadDerate® «i. Si Os POS (OS 4 patie 


NOTES: 

1. All timings are referenced to 1.5V. 

2. The clock parameters apply to all four 2xClocks: Cilk2xSys, Cilk2xSmp, Clk2xRd, and Cik2xPhi. 

3. This parameter is guaranteed by design. 

4. These parameters apply when the 79R3010 Floating Point Coprocessor is connected to the CPU. With phase lock on, Reset must be asserted for the 
longer of 3000 clock cycles or 200 microseconds. 

. Teycis one CPU clock cycle (two cycles of a 2x clock). 

6. With the exception of the Run signal, no two signals on a given device will derate for a given load by a difference greater than 15%. 

7. Clock transition time < 2.5ns for 33.33 MHz; clock transition time < 5ns for other speeds. 


ai 
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PS EE TT RITTER IESE A ET ET ID eT TEE I SE LT TT FS ST OE I EE I ST AE 


AC ELECTRICAL CHARACTERISTICS FOR IDT79R3000AE" 2: 9) — 
COMMERCIAL TEMPERATURE RANGE (Ta = 0°C to +70°C, Voc = +5.0V + 5%) 


16.67MHz 20.0MHz | UNIT | 
SYMBOL PARAMETER TEST CONDITIONS MIN. MAX.| MIN. MAX. Te a ee UNIT 


ipl ck High rT aa re a eT 
input Glock Low pT Woie7 ras — [10 — |e —|«6 —| | 


10 
Tckp Input Clock Period(@) 30 500 | 25 500 20 500 
Clk2xSys to Clk2xSmp\) 0 tcyc/4| OO tcyc/4{ 0 teyc/4 4 
Clk2xSmp to Clk2xRd(®) 0 tcyc/4] 0 _ tcyc/4 0 tcyc/4 4 
Clk2xSmp to Clk2xPhi(® 9 tceyc/4] 7  tcyc/4 5 teyc/4] 3.5. teyc/4 


Run Operation 


Tocn | Data Enabie® | ——+d 
Toos | Data Disable® + 
Tov 
mT 
PDaaseupSSCid SSCS 
PDataHowt) Sid CSSSSCSCSC~CS 

aan 

sy 



























; 

i) 
| 
LA 
wm 

| 

b 

wn 


=a 


| 
; 
A 
wn 


PS 
a 


a ee eee Eel 


13 _ 7 _— 


: 

on 
[|| 
hI] 


Tps 
TbH 
a 
Toa 
Tact 


_ 
te 


25 — 25 — 






| 
_ 
> 


mb 
oO 
IN 
| 
: 
i) 


ie 
= 
Tex 
Th 
Ths | uy Setup | 
Trt | Tala) Hol a 


Stall Operation 


Tsaval 
Tsact 
Tura 
Tmrat Memory Read Terminate 
Tsu 
F 


| 
_ 
foe) 


ine) 
NI 


: 


— 15 


nidddadaaaadiaa 


il 
NI 
NO 
; 


mn 
N18 
an 
| |= 


_ 
oO uo 


30 23 1 






Nh 7h 
il 
nh 
wo 


— 293 = 
3 15 2 . 


3 


N 
on 


Tran Load= 250F | — 
Tsmwr__| Memory Write Load = 25p 
Tsexc Exception Valid Load = 25pF —_ — 13 Cee fOr |e ee BE I ine| 


Reset Initialization 

Tran rs | Pé — [6 —| tao 
imi | 3000 _— | 3000 — | 3000 — | 3000 — | Teyc| 

Trstep__|_Reset timing, Phase-lock off(*. 5) 

Capacitive Load Deration 


TrstPLL_ | Reset timing, Phase-lock on(4- 5) 

[CLD [toad Derato®) Tl ost fos 1 fos 1 (nsi25pr 

NOTES: 

1. All timings are referenced to 1.5V. 

2. The clock parameters apply to all four 2xClocks: Clk2xSys, Clk2xSmp, Clk2xRd, and Clk2xPhi. 

3. This parameter is guaranteed by design. 

4. These parameters apply when the 79R3010 Floating Point Coprocessor is connected to the CPU. With phase lock on, Reset must be asserted for the 
longer of 3000 clock cycles or 200 microseconds. 

. Teyc is one CPU clock cycle (two cycles of a 2x clock). 

. With the exception of the Run signal, no two signals on a given device will derate for a given load by a difference greater than 15%. 

7. Clock transition time < 2.5ns for 33.33 MHz; clock transition time < 5ns for other speeds. 


20 
ae ee ee 


nN 
w 


wo 
— IP _ 
aIn|~yN 





a uo 
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AC ELECTRICAL CHARACTERISTICS FOR IDT79R3000A"" 2 3) — 
MILITARY TEMPERATURE RANGE (Tc =-55°C to +125°C, Voc = +5.0V + 10%) 


16.67MHz 20.0MHz 
SYMBOL PARAMETER TEST CONDITIONS | \CS7MEIE || |. 200NE eta oe UNIT 


Ponce nER® Weer ee to ee 
Toxtow | Input Clock Low®) Note 7 ae a 


TcKP Input Clock Period(2) 
Clk2xSys to Clk2xSmp\®) tcyc/4 






0 teyc/4 
tcyc/4 
9 teyc/4 


tcyc/4 
teyc/4 


Clk2xSmp to Clk2xRd‘) 
Clk2xSmp to Clk2xPhi(®) 











Run Operation 


ToEn 
To's ph eee] 
Toval Load=25pF_ | — 3 
Two! Load=25pF_ | — 5 
ee 
[25 =| 





| 
i. 
in 





Tos 
To gee 
Toss 30 | tt = 8 
pes 
Tact Load = 25pF pote sor ad 
Tat2 | Access Type (2) Load = 25pF ae —- 4 | — 12 | 
Load = 25pF 
Texe Load=25pF_ =| — 7 | ~— 7 | — 5 | 
Address Valid Load=25pF_ | — 2 | — 2 | — 2 | 
PSB os 
aes, a | 


| 





Tints Int(n) Set-up 
TintH Int(n) Hold 
Stall Operation 


Tsaval Load=25pF_ | — _30_| 
Tact Load = 25pF 
Turd Load = 25pF 1 27 

Tunes Load = 25pF = 27 

Tsu Load = 25pF 
Thun Load = 25pF 
Tsmwr Load = 25pF 
TSExe Exception Valid Load = 25pF pear. a | 


Reset Initialization 


TRst Reset Pulse Width 


TrstpLL | Reset timing, Phase—lock on(4 5) 
Trstep Reset timing, Phase—lock off(4: 5) 
Capacitive Load Deration 


OLD [load Dest SS] SSSSCSC~idC— TCC _*si 

NOTES: 

1. All timings are referenced to 1.5V. 

2. The clock parameters apply to all four 2xClocks: Clk2xSys, Clk2xSmp, Clk2xRd, and Clk2xPhi. 

3. This parameter is guaranteed by design. 

4. These parameters apply when the 79R3010 Floating Point Coprocessor is connected to the CPU. With phase lock on, Reset must be asserted for the 
longer of 3000 clock cycles or 200 microseconds. 

. Tcyc is one CPU clock cycle (two cycles of a 2x clock). 

6. With the exception of the Run signal, no two signals on a given device will derate for a given load by a difference greater than 15%. 

7. Clock transition time < 2.5ns for 33.33 MHz; clock transition time < 5ns for other speeds. 


4 o = 
a = ed 





nh 
w 


tf 
nN 
[e%) 


nh 
| jo] | | 
in foo | > ro 







= NTN 
w Oyo | 
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ee) 
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oO 
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AC ELECTRICAL CHARACTERISTICS FOR IDT79R3000AE™ * * — 
MILITARY TEMPERATURE RANGE (Tc = -55°C to +125°C, Voc = +5.0V + 10%) 


16.67MHz 20.0MHz 
SYMBOL PARAMETER TEST CONDITIONS MIN. MAX. | MIN. MAX. ee | UNIT 









Clock 


Input Clock High(2) 
Input Clock Low(?) 


Tckp Input Clock Period(@) 
Cik2xSys to Clk2xSmp'®) 







teyc/4 teyc/4 
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Clk2xSmp to Clk2xRd(®) 
Clk2xSmp to Cik2xPhi(®) 
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Stall Operation 
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Twrai 
Treat Memory Read Terminate 
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Thun 
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TsExc Exception Valid | Load=25pF | O— S15 
Reset Initialization 
Tast 
Tis 


Trstep Reset timing, Phase—lock off(4. 5) 
Capacitive Load Deratlon 


CLD | Load Derato™ Ee er os 


NOTES: 

1. All timings are referenced to 1.5V. 

2. The clock parameters apply to all four 2xClocks: Clk2xSys, Clk2xSmp, Clk2xRd, and Clk2xPhi. 

3. This parameter is guaranteed by design. 

4. These parameters apply when the 79R3010 Floating Point Coprocessor is connected to the CPU. With phase lock on, Reset must be asserted for the 
longer of 3000 clock cycles or 200 microseconds. 

. Teyc is one CPU clock cycle (two cycles of a 2x clock). 

. With the exception of the Run signal, no two signals on a given device will derate for a given load by a difference greater than 15%. 

7. Clock transition time < 2.5ns for 33.33 MHz; clock transition time < 5ns for other speeds. 
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IDT79R3000A/AE RISC CPU PROCESSOR MILITARY AND COMMERCIAL TEMPERATURE RANGES 


Tcklow Tckp 


Clk2xSys ——~ i, 
Clk2xSmp fe 
Trd 
Clk2xRd re | 


Clk2xPhi 


Figure 12. Input Clock Timing 


SysOut 


SmpOut* 








* These signals are not actually output from the processor. They are drawn to provide 
a reference for other timing diagrams. 


Figure 13. Processor Reference Clock Timing 
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Figure 14. Synchronous Memory (Cache) Timing 
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Figure 15. Memory Write Timing 
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Figure 16. Memory Read Timing 
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Co-Processor Store Co-Processor Load 


Phase 


SysOut 








Data Bus 


CpBusy 


Exception 





CpCond(n) 
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Figure 17. Co-Processor Load/Store Timing 
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Phase 


SysOut 


PhiOut 


Int(n) 





Figure 18. Interrupt Timing 








Reset 
TIntH 


NOTES: 
1. Reset must be negated synchronously; however, it can be asserted asynchronously. Designs should not rely on the proper functioning of SysOut prior 


to the assertion of Reset. 
2. if Phase—Lock On or R3000 Mode are asserted as mode select options, they should be asserted throughout the Resetperiod, to insure that the slowest 


co-processor in the system has sufficient time to lock the CPU clocks. 
3. Reset is actually sampled in both Phase 1 and Phase 2. To insure proper initialization, it is recommended that Reset be negated relative to the end of 


Phase 1. 
Figure 19. Mode Vector Initialization 


51 





IDT79R3000A/AE RISC CPU PROCESSOR MILITARY AND COMMERCIAL TEMPERATURE RANGES 


ORDERING INFORMATION 


IDT 5 — XX xX 
Device Type Speed Package Process/ 
Temp. 
Range 
Blank Commercial (0°C to +70°C) 
B Military (-55°C to +125°C) 
Compliant to MIL-STD-883, Class B 


M Military Temperature Range Only 
G 175 175—-Pin PGA (Cavity Down) 

G 144 144-Pin PGA (Cavity Down) 

F 172-Pin Flat Pack 

16 16.67 MHz 

20 20.0 MHz 

25 25.0 MHz 

33 33.33 MHz 


79R3000A RISC CPU Processor 
79R3000AE — Enhanced Timing Version 
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R!ISController™ PRELIMINARY 
IDT79R3001 


tly 


Integrated Device Technology, Inc. 





FEATURES: 


eoe7eee# 


Enhanced Instruction Set compatible version of IDT79R3000 
RISC CPU 

Achieves high-performance with reduced parts count and 
lower overall system cost 

Flexible on-chip cache controller supports various cache, 
main memory sizes 

Supports optional data parity with parity error output signal 
Works with IDT79R3010 RISC Floating-Point Coprocessor 
DMA interface support 

Large synchronous memory space for real-time systems 
Full 32-bit operations — 32-bit registers, 32-bit address and 
data interface 

On-chip memory management unit with 64 fully associative 
TLB entries maps 4 Gbyte virtual address space 
High-speed interrupt response (6 interrupt input pins) with 
precise exception capability 

High-speed CEMOS™ technology results in speeds from 
12.5 to 25MHz 

Supports caches from 8 Kbytes to 16Mbytes 


e Independent block refill sizes forthe instruction and data 
caches 

Concurrent cache refill and execution 

Works on 8-, 16— and 32-bit data 

Supports unaligned 32-bit data 

Optimizing compilers for C, Ada, Pascal, Fortran 

RTOS support for C or Ada environments 


DESCRIPTION: 


The IDT79R3001 brings the high-performance inherent in the 
IDT79R3000 RISC Microprocessor to lower cost systems. It does 
this while maintaining full (both User and Kernel) software com- 
patibility with both the IDT79R2000A and IDT79R3000 RISC 
Microprocessors. 

The IDT79R3001 achieves lower system cost by reducing the 
number of components required to construct a synchronous mem- 
ory (or cache) external to the processor and by simplifying the 
asynchronous memory interface. By removing the requirement for 
parity and allowing the system designer to select the cache organi- 
zation which best suits the system, overall parts count is dramati- 
cally reduced while maintaining high performance. 
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Figure 1. IDT79R3001 Block Diagram 


CEMOS and RiSController are trademarks of Integrated Device Technology, Inc. 
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The IDT79R3001 RISC Microprocessor extends the ability of the 
IDT79R3000 family to support embedded and cost sensitive appli- 
cations. Its level of integration and flexibility allows high- 
performance systems to be constructed at reasonable cost in a 
straightforward manner, without forcing the system designer to 
support features not required in his application. 

The IDT79R3001 consists of two tightly coupled processors inte- 
grated on asingle chip. The first processor is a full 32-bit CPU 
based on RISC principles to achieve a new standard of perform- 
ance in microprocessor based systems. The second processor is 
a system control co-processor, called CPO, containing a fully as- 
sociative 64—-entry TLB (Translation Lookaside Buffer), MMU 
(Memory Management Unit), and control registers, supporting a 4 
Gigabyte virtual memory subsystem and a Harvard Architecture 
Synchronous Memory/Cache controller which achieves ultra-high 
bandwidth using industry standard SRAM devices. 

This data sheet provides an overview of the features and archi- 
tecture of the IDT79R3001 CPU. A more detailed description of 
the operation and timing of this device is incorporated in the 
“IDT79R3001 Hardware User’s Guide”, and a detailed architec- 
tural overview is provided in the “mips RISC Architecture” book, 
both available from IDT. Further literature describing the hard- 
ware, software, and development tools for the IDT79R3001 are 
also available from IDT. 


HARDWARE OVERVIEW 


The IDT79R3001 is a high-performance RISC microprocessor 
incorporating a fast execution engine and sophisticated yet flexible 
memory interface designed to support the processor bandwidth re- 
quirements at minimal system cost. 


Execution Engine 


The IDT79R3001 contains the same basic execution engine as 
the ultra-high performance IDT79R3000 and thus achieves over 
20 MIPS performance at 25 MHz. 

The key to the performance of the processor is the instruction 
pipeline, illustrated in Figure 2. The execution of a single 
IDT79R3001 instruction consists of five primary steps, some of 
which may be broken down further into smaller subsets. 






Instruction 
Flow 
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F RD ALU MEM WB 
Write 
inn ee 





One Cycle 


Figure 2. IDT79R3001 Five—-Stage Pipeline 


The five primary stages of the pipeline, each of which require ap- 
proximately one CPU cycle, are: 


IF instruction Fetch, when the processor fetches the in- 
struction from the Instruction Synchronous Memory 

RD Read required operands from on-chip register file 
while decoding the instruction. 

ALU Perform the required operation on instruction oper- 
ands. 

MEM Access data memory (load or store) 

WB Write results back to register file. 


Thus, the CPU achieves an average execution rate approaching 
one instruction per CPU cycle, since the execution of five instruc- 
tions at a time are overlapped within the processor (Figure 3). Op- 
timizing compiler technology fully comprehends the interaction of 
software with the various pipeline resources, and serves to both 
eliminate any potential pipeline conflicts which might arise and to 
maximize instruction throughput. 


Current 
CPU 
Cycle 


Figure 3. Instruction Execution in 1DT79R3001 Pipeline 
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The IDT79R3001 Memory Interfaces 


The key to achieving the inherent performance of the 
IDT79R3001 is to design a memory subsystem capable of provid- 
ing a new instruction to the processor on almost every clock cycle. 

Like the IDT79R3000, the IDT79R3001 supports a hierarchical 
view of the memory subsystem. However, the IDT79R3001 allows 
the system designer to make more trade-offs in the partitioning 
and architecture of the various levels in order to more completely 
meet the needs of certain types of applications. 

The IDT79R3001 supports two classifications of external mem- 
ory: synchronous and asynchronous. The Harvard—Architecture 
(separate instruction and data memories) synchronous memory 
allows the processor to achieve the highest levels of performance. 
The processor is able to obtain both an instruction and data word 
from the synchronous memory on every clock cycle, resulting in 
high instruction and data throughput. 

The asynchronous memory space contains larger, slower mem- 
ory devices such as EPROM, main memory DRAMs, and periph- 
eral devices. Multiple clock cycles are required for data movement 
in the asynchronous memory. 

Many systems implement a memory hierarchy between these 
two memory spaces, whereby the synchronous memory space is 
used as processor caches and the asynchronous memory space is 


‘ 1 ‘ 2 
: (Instruction : (Data 
8 Read) ‘ Read) 


AddrLo 
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used for main memory. The IDT79R3001 integrates a flexible Di- 
rect-Mapped Cache Controller On—-Chip, eliminating external 
cache control logic and minimizing cache management overhead. 
If the synchronous memory space is used for processor caches, 
then cache “misses” will cause the processor to automatically 
process an asynchronous memory transfer to refill the cache. 

The key to achieving the system cost and performance goals of 
an IDT79R3001-based system is to partition the memory system 
to the needs of the application. 


Synchronous Memory System 


As with any high-performance processor, the IDT79R3001 re- 
quires high-bandwidth to achieve high-performance. Thus, it is 
important that the majority of its execution occur in the synchro- 
nous memory space. In applications which require substantial 
amounts of main memory, this memory space will be implemented 
as instruction and data caches. 

The synchronous memory is designed to be able to supply both 
an instruction and data word to the processor on each clock cycle. 
When the synchronous memory spaces are used as caches, then 
they are used to hold instruction and data that is repetitively ac- 
cessed by the CPU (for example, within a program loop). This re- 
duces the number of slower asynchronous memory cycles and 
thus achieves higher performance. 
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Read) 
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Instr. RAM CPU Data Pins 


Figure 4. Synchronous Memory Control Timing 


Some microprocessors incorporate small amounts of cache on— 
chip, which has avery small and unpredictable effect on the execu- 
tion of large programs. The IDT79R3001 supports caches of from 
8kB in size up through 16MB, thus bringing substantial perform- 
ance improvements to very large programs and also allowing real— 
time system designers to design cache-based systems to support 
deterministic requirements. 

The IDT79R3001 directly controls the synchronous memory in- 
terface (whether it is being used as caches or not) with a minimum 
of external components, The IDT79R3001 includes all control sig- 
nals and cache TAG control logic (for a direct mapped cache) for 
the synchronous memory interfaces. Parity over the data portion 


of each synchronous memory can be optionally selected at RE- 
SET time for applications which desire to make this cost trade-off. 

The synchronous interface works by dividing the basic CPU cy- 
cles into two phases. During one phase, a cache address is pre- 
sented by the processor and captured by external latches (the 
latch contro! signals are directly generated by the CPU). During 
the next phase, the address for the other memory space is gener- 
ated and captured while the data movement operation for the first 
cache is completed. The processor directly generates the SRAM 
Output Enable and Write Enable signals and the address latch en- 
able signals, requiring no external decoding. This is illustrated in 
Figure 4. 
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Furthar, the IDT79R3001 supports the ability to refill multiple 
words into the cache from main memory when a cache—miss oc- 
curs, further reducing system cost and increasing performance in 
cache—based systems. The IDT79R3001 can obtain 1, 4, 8, 16, or 
32 words from main memory when processing a cache—miss, thus 
amortizing the cache—miss penalty over a large amount of data. 

The IDT79R3001 also performs instruction streaming, which is 
the simultaneous execution of incoming instructions while the 
cache is being refilled. 

The actual width of the tag bus, and whether or not parity over the 
data parts of each synchronous memory, is determined according 
to how the device is initialized. The 1DT79R3001 can accommo- 
date a TAG bus width of 0-19 bits, compatible with a variety of 
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cache sizes and cacheable main memory choices. The 
IDT79R3001 allows the system designer to scale the synchronous 
memory system exactly according to the system needs, thus elimi- 
Nating extra memory and logic devices and achieving substantial 
cost savings with no loss of performance. 

Thus, the synchronous memory interface of the IDT79R3001 al- 
lows for high-bandwidth memory systems to be implemented with 
a minimum of control logic. This is desirable, since RISC perform- 
ance tends to be a function of memory bandwidth. By simplifying 
the design of the synchronous memory system (illustrated in Fig- 
ure 5), itis easier for the system designer to achieve high perform- 
ance with minimum chip count and without requiring ultra-fast or 
specialty components. 
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Figure 5. IDT79R3001 Synchronous Interface 


The TAG Bus 


The TAG bus of the IDT79R3001 has been designed to allow the 
system designer to implement the exact cache configuration that is 
right for the system. For larger caches, low-order TAG bits do not 
need to be supplied for the TAG comparison. Additionally, the 
number of high-order TAG bits supplied is determined by the sys- 
tem designer, according to the amount of cacheable main memory 
the system supports. Since most embedded systems would tend 
to implement caches of 16KB and greater, and cacheable memory 
spaces of 32MB or smaller, significant cost and area reductions 
are achieved by configuring a smaller TAG bus. 

The system configures the on-chip TAG comparator at RESET 
initialization time. If a TAG bit is not to be included in the synchro- 


nous memory TAG bit compare, a pull—down resistor of 4kQ is con- 
nected to the appropriate IDT79R3001 TAG pin. If a TAG bit is to 
be included, no resistor is required (the IDT79R3001 pulls floating 
inputs to Vcc during RESET by a small pull-up, which is disabled 
when RESET is negated). 

If a TAG bit is excluded from the cycle-by—cycle comparison, it is 
still driven out with the appropriate address value during write cy- 
cles or asynchronous memory reads. Thus, the system designer 
still has the full 4 Gbyte of address space available for address de- 
coding, without requiring the synchronous memory to be able to 
cache all such addresses. 
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Figure 6 illustrates a reduced system, which implements 16KB of 
Instruction and 16KB of data cache, and 512MB of cacheable ad- 
dress space, using just 6 IDT71586 4Kx16 Latched CacheRAM™ 
components and 4 pull-down resistors. 

Note that in systems which do not implement the synchronous 
memory space as cache, then pull-down resistors would be added 
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to all TAG pins. The Valid Pin still needs to be supplied on each 
cycle, thus allowing various memory schemes to be implemented 
(such as static column DRAM). However, the IDT79R3001 can be 
initialized to not assert the Valid pin as an output during Write cy- 
cles, simplifying the design of logic to drive the signal. 
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Figure 6. Small Footprint Cache for IDT79R3001 


Cache Update 


When the on-chip TAG comparator indicates that the item read 
from the cache was not the desired item, a cache—miss is proc- 
essed. A main memory (asynchronous) transfer is automatically 
processed. 

The IDT79R3001 desires to update the cache using a burst refill 
of multiple adjacent words from main memory. The processor is 
“stalled” until the first word of the block is available. The processor 
is then released, and the block of words is brought into the cache at 
the rate of one word per CPU clock cycle. 

Note that if the cache—miss was in the instruction cache, the 
processor is capable of simultaneously executing the incoming in- 
struction stream as the cache is updated, thus effectively making 
the cache update transparent to the system and increasing 
performance. 


Write Cycles 


The IDT79R3001 utilizes a write through cache. That is, data 
written by the processor is both written to the cache and main 
memory simultaneously. Thus, main memory always has a cur- 
rent copy of all data. 

Typically, latching devices are used between the cache subsys- 
tem and the slower main memory. These Write Buffers capture the 
data simultaneous with the cache update, allowing the processor 
to continue to the next cycle without actually waiting for the main 
memory transfer to complete. The IDT79R3001 generates parity 
over the data field on write cycles, which can be propagated into 
both the synchronous and asynchronous memory spaces. 

When the processor writes less than a 32-bit quantity (a “partial” 
word), the processor can perform a “read—modify—write” of the 
cache. That is, the processor will read the 32-bit word containing 
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the partial address(es) to be updated from the cache. If a “hit” oc- 
curs, then the new data will be merged with the old and the new 
32-bit value will be written both to the cache and to main memory. 
if acache “miss” occurs, then only the partial data is written to main 
memory and the cache is unchanged. Partial word capability is se- 
' lected as a RESET option. 


THE ASYNCHRONOUS MEMORY 
INTERFACE 


The IDT79R3001 also supports an asynchronous memory inter- 
face, which supports the use of slower memory devices such as 
slow DRAM, EPROM and also supports the use of peripherals and 
other “non-cacheable” devices. 

In general, if acache—miss (or parity error, if enabled) occurs, the 
processor will automatically use the asynchronous memory inter- 
face to retrieve the desired data, and will update the cache 
accordingly. 

Additionally, software can force the use of the asynchronous 
memory space through the use of the on-chip MMU. When the 
processor seeks either instructions or data within acertain address 
range (kseg1), the processor knows that this data is uncacheable 
and will perform an asynchronous memory transfer. Additionally, 
within cacheable memory, TLB entries can be used to mark certain 
pages as “uncacheable”. When an address of an “uncacheable” 
page is used, the processor will automatically use the asynchro- 
nous memory space. 

The asynchronous memory space uses the same data bus as 
the synchronous memory space. This facilitates the automatic up- 
dating of cache memory when the asynchronous memory is ac- 
cessed due to cache-miss activity or memory writes. The asyn- 
chronous address bus is composed from the synchronous mem- 
ory AddrLo bus, and the TAG bus. External logic devices (such as 
IDT74FCT374A registers) are used to capture AddrLo and TAG 
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values for the asynchronous transfer address. Note that systems 
which exclude invididual TAG bits from comparison (to reduce 
cache width) still have all TAGs available as outputs. 

The data path between the processor and the asynchronous 
memory space is managed according to the needs of the applica- 
tion. Write Buffer FIFO devices, such as the IDT79R3020, are 
used to capture address and data during store cycles. These de- 
vices are used to capture the data in onecycle, and allow the proc- 
essor to continue to execute from the synchronous memory while 
the slower asynchronous memory actual retires the write. 

The read path is also constructed according to the needs of the 
system. If block refill is used, then the read path is highly depend- 
ent on the design of the main memory system. Pipeline devices 
such as IDT74FCT520A, or simple latches such as IDT74FCT374, 
may be used. 

A simple asynchronous memory interface is shown in Figure 7. 
In this system, main memory is assumed to be fast enough to sup- 
port the block refill requirements of the system, thus simplifying the 
read path. In fact, both the read and write data paths are actually 
managed through a single set of IDT29FCT52A bidirectional latch- 
ing transceivers. 

During write cycles (which are typically captured by Write Buff- 
ers), the processor asserts MemWr to indicate that a write cycle is 
in progress. The memory system negates WrBusy to indicate that 
the processor is done with the write cycle. 

During read cycles, the processor will assert MemRd to indicate 
that a main memory read is in progress. The memory system will 
hold RdBusy active until the desired data is available. The proces- 
sor will activate the XEn signal to allow data to be passed from the 
main memory to the processor data bus. If the cache is to be up- 
dated with the new data, then the processor will assert the appro- 
priate cache write signal to allow the cache RAMs to capture the 
incoming data bus. 
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Figure 7. 1DT79R3001 Asynchronous Interface 


The AccTyp bus is used to indicate the size of the data transfer 
(8, 16, 24, or 32 bits), and for main memory reads, whether or not 
the data is “cacheable”. This simplifies the main memory address 
decoding, since the AccTyp indicates whether the main memory 
needs to perform a burst read of multiple words. 


Co-Processor Interface 


The IDT79R3001 implements a co—processor interface, which 
allows the use of the IDT79R3010 high—performance RISC Float- 
ing Point Accelerator without requiring the use of external interface 
components. 

The co—processor interface has been designed to make system 
co-processors appear to the programmer as if they were on-chip 
extensions of the core execution engine. Thus, the IDT79R3010 
FPA works as a true co-processor, rather than as a peripheral 
which must be programmed. 

In the IDT79R3001 co-—processor model, the CPU is responsible 
for controlling all datacycles. The co—processor keeps in synchro- 
nization with the CPU (including the pipeline stages), and uses a 
Phase—Locked Loop to keep synchronized with the processor bus 
traffic. The co—processor then “snoops’” the data bus, watching for 
co-processor instructions. It also knows when data cycles on the 
bus are intended for it (either as a target in co—processor load op- 


erations, or as a source for co—processor store operations), and 
performs the data portion of the operation when appropriate. 
Thus, co—-processors effectively load and store directly with mem- 
ory, without requiring operands to go through the CPU first. This 
achieves the highest levels of performance (note that the co—proc- 
essor interface also supports move, whereby data can be moved 
directly between the CPU and any co-processor), 

Figure 8 illustrates the use of the IDT79R3010 in a IDT79R3001 
system. The co-processor interface manages synchronization 
between the parts, and is used to communicate status from the co- 
processor to the CPU. CpBusy, or co—processor busy, stalls the 
CPU until the busy co—processor resource (requested by a co— 
processor instruction) is free, and CpCond, or co—processor condi- 
tion, is used to report status on co—processor test instructions. 
CpSync, is used to help the co—processor stay “locked” to the 
CPU, so that the co—processor knows when data is on the bus to 
be sampled on load operations or when to place data on the bus for 
store operations. 

Note that the co-processor sits on the same data bus as the 
CPU, but has no connection to the address bus. The CPU is re- 
sponsible for performing all memory addressing, including the de- 
termination of “cache hit”, write—buffer fullcycles, and any process- 
ing that might be required for cache misses. 
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Figure 8. IDT79R3001 Interface to IDT79R3010 Floating Point Co—Processor 


interrupts 


The IDT79R3001 features 6 separate interrupt input pins. Inter- 
rupts are not vectored, but rather cause the general exception vec- 
tor address to be the next execution address. 

These pins are not encoded internally; external logic can choose 
to implement these interrupt lines as either 6 or 64 interrupt 
sources; software would then perform the appropriate decoding to 
get to the specific interrupt handler. 

Interrupts are recognized in the ALU stage of the on-chip pipe- 
line. Instructions less advanced in the pipeline are “flushed” and 
will be restarted when the return from exception occurs (an on— 
chip register contains the address of the instruction which was ex- 
cepted). Instructions further advanced in the pipeline are allowed 
to continue. Unlike other RISC processors, the IDT79R3001 does 
not require the programmer to save and restore pipeline status to 
allow normal execution to be resumed. Depending on the applica- 
tion and exception, at most software would need to save/restore 
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the on-chip data registers, status register, Exception PC and ex- 
ception “cause” register. 

Note that the co—processor model includes “precise exceptions”. 
That is, an exception is signaled to the exact instruction which gen- 
erated the exceptional condition. No further state commitments 
are made by the IDT79R3001 and, thus, the exact context at the 
time of the exception is known to the programmer. This is true 
even for multi-cycle operations, such as those of the FPA. 


DMA Interface 


The IDT79R3001 features a simple DMA interface which allows 
an external master to gain control of the synchronous memory 
space. Note that it is not necessary to include logic on the CPU to 
arbitrate for the asynchronous memory space; the read/write 
buffer interface is where such arbitration logic belongs and it is left 
to the system designer to implement the type of asynchronous 
memory structure that best fits the application. 
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Figure 9. IDT79R3001 DMA Interface 


When an external master “owns” the synchronous bus, the CPU 
will tri-state the following pins and buses: 


AddrLo: The Synchronous memory direct address bus. 
Data & Tag: The synchronous memory RAM data lines. 


Cache Control: IRd, IWr, IClk, DRd, DWr and DCIk. This al- 
lows the external master to use the existing control 
lines to control the synchronous memory. 


XEn: The read buffer transceiver enable, which will allow 
the external master to use the read/write buffer path 
for DMA. 


Valid: This enables the DMA interface to be used for multi- 
processing applications. 


The DMA interface consists of a single input signal, DMAStall, 
which causes the processor to stall and to tri-state the above 
named lines. The external master is guaranteed mastership of the 
bus within a very short number of cycles, depending on the exact 
external bus activity of the CPU when the DMA was requested. 
The DMA master negates the DMAStall signal when the DMA op- 
eration is completed to allow the CPU to resume processing. Con- 
sult the “IDT79R3001 Hardware User’s Guide” for more details. 

Figure 9 illustrates the system connection of an external DMA 
master to a IDT79R3001 system. 


Advanced Features 


The IDT79R3001 contains special features which provide added 
flexibility across a number of applications, as well as allow for sys- 
tem diagnostic support. 

In support of diagnostics, the IDT79R3001 allows for cache 
“swapping” (interchange of which memory bank is for instruction 
and which is for data), which is useful in system initialization, cache 
flushing, and diagnostics. Additionally, the caches can be “iso- 
lated” from main memory, which forces cache “hits” to occur re- 
gardless of the tag comparison, and which is useful in determining 
that the synchronous memory space RAMs are functional. 

An additional feature is the ability to enable parity checking over 
the data field of each synchronous memory. If parity is enabled, 
the processor will check the parity when a synchronous access oc- 
curs; if a parity error is detected, it is signaled to the external world 
on the Parity Error signal and a cache—miss cycle is processed. 
The Parity Error signal will remain low until the parity error flag in 
the CPO status register is cleared by software. 

Anumber of other system selectable features are selected at re- 
settime. The input reset “vectors” are sampled on the interrupt in- 
put lines during the last four cycles of the reset period. The input 
vectors are listed in Table 1. These selections include the ability to 
select the block refill sizes for each of the instruction and data 
memories, whether Big Endian or Little Endian order is to be used, 
whether to use data parity, and whether or not to accommodate a 
Phase—Locked Loop for a co—processor. The initialization of the 
CPU and meaning of each input vector is more fully explained in 
the “IDT79R3001 Hardware User's Guide”. 
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Table 1. IDT79R3001 Mode Selectable Features 


PROCESSOR ARCHITECTURE 


The IDT79R3001 is a full implementation of the IDT79R2000A/ 
IDT79R3000 Instruction Set Architecture (the MIPS-I ISA). This 
architecture is discussed in great detail in “mips RISC Architec- 
ture”, available from IDT. 


IDT79R3001 CPU Registers 


The IDT79R3001 CPU provides 32 general purpose (orthogo- 
nal) 32-bit registers, a32—bit Program Counter and two 32-bit reg- 
isters used to hold the results of the CPU integer multiply and di- 
vide operations. 

Two of the 32 general registers have special purposes designed 
to increase processor performance: register r0 is hardwired to the 


General Purpose Registers 
3 


mk 


r31 


31 


value “0”, a useful constant; and register r311 is used as the link reg- 
ister in jump—and-link instructions (the return address for subrou- 
tine calls). Otherwise, there is no requirement that a particular reg- 
ister be used as a stack or frame pointer, etc., although there is a 
register convention as part of the "mips ABI” (Applications Binary 
Interface standard) which the compiler suite uses. 

The CPU registers are illustrated in Figure 10. Note that there is 
no Program Status Word register shown in this figure. The func- 
tions traditionally provided by a PSW register are instead provided 
in the Status and Cause Registers incorporated within the on-chip 
System Control Co—Processor (CPO). The instruction set does not 
use condition codes. 


Multiply/Divide Registers 
31 
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Figure 10. IDT79R3001 Registers 


Instruction Set Overview 


All IDT79R3001 instructions are 32 bits long and there are only 
three instruction formats (see Figure 11). This approach simplifies 
decoding, thus minimizing instruction execution time. The 
IDT79R3001 processor initiates a new instruction on every RUN 
cycle, and is able to complete an instruction on almost every clock 
cycle. The only exceptions are the LOAD instructions and 
BRANCH instructions, which each have a single cycle of latency 
associated with their execution (that is, the instruction immediately 
after the branch is always executed regardless of the branch con- 
dition; similarly, the data loaded by a LOAD instruction is not avail- 


able to the subsequent instruction). However, in the majority of 
cases the compilers (and even the MIPS assembler) is able to re- 
order instructions to fill these latency cycles with useful instructions 
which do not require the results of the previous instruction (in the 
worst case, a NOP instruction is inserted). This effectively elimi- 
nates these latency effects and does not require the applications 
programmer to be aware of the pipeline structure. 

The actual instruction set of the CPU was determined after ex- 
tensive simulations to determine which instructions should be im- 
plemented in hardware and which operations are best synthesized 
in software from other basic operations. This methodology has re- 
sulted in the highest performance processor available. 
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Figure 11. 


The IDT79R3001 instruction set can be divided into the following 
groups: 

e Load/Store Instructions move data between memory and 
the general registers. These are all “I-Type” instructions. 
The only addressing mode supported is base register plus 
signed, immediate 16-bit offset. This effectively allows three 
addressing modes: register plus offset, register (using zero 
offset), and immediate (using r0, the zero register). 

The Load instruction has a single cycle of latency, as 
described above. That is, the instruction immediately after 
the load instruction cannot tely on the new data; however, the 
assembler and compilers automatically handle this, 
reordering code to insure that no conflicts occur. Note that 
the store operation has no latency in its effect. 

Loads and stores can be performed on byte, half-word, 
word, or unaligned word data (32-bit data not aligned on a 
modulo—4 address). 

¢ Computational instructions perform arithmetic, logical, and 
shift operations on values in registers. They occur in both 
“R-Type” (both operands and the result are general 
registers), and “I-Type” (one operand is a 16-bit immediate 
value) formats. 

Note that computational instructions are three operand 
instructions: that is, the result register can be different from 
both source registers. This means that operands need not 
be overwritten by arithmetic operations. This results in a 
more efficient use of the register set, and further increases 
performance. 

e Jump and Branch instructions change the flow of control of a 
program. Jumps are always to a paged absolute address 
formed by combining a 26-bit target with four bits of the 
Program Counter (“J-Type” format for subroutine calls), or 
32-bit register byte addresses (“R—-Type”, for Returns and 
dispatches). Branches have 16~bit offsets relative to the 
program counter ("I-Type”). 


IDT79R3001 Instruction Formats 


Jump and Link instructions save a return address in Register 
31. The IDT79R3001 instruction set features numerous 
branch conditions. Included is the ability to branch based on 
a comparison of two registers, or on the comparison of a 
register to zero. Thus, net performance is increased since 
the processor does not have to precede the branch 
instruction with arithmetic operations. 

e Co-processor instructions perform operations in the 
co-processors (such as the IDT/9R3010_ FPA). 
Co-—processor Loads and Stores are “I-Type”; computational 
instructions have co—processor dependent formats. 

« Co-processor 0 instructions perform operations on the 
System Control Co—Processor (CPO) registers to manipulate 
the memory management and exception handling facilities of 
the on-chip co—processor. 

e Special instructions perform a variety of tasks, including 
movement of data between general and special registers, 
system calls, and breakpoint operations. These are always 
“R-Type”. 


IDT79R3001 System Control Co—processor (CPO) 


The IDT79R3001 can operate with up to four tightly coupled co— 
processors, designated CPO-CP3. CPO is included on-chip as 
co-processor 0, the System Control Co—processor. CPO is re- 
sponsible for supporting both the virtual memory system and the 
exception handling functions of the IDT79R3001. 
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Load/Store Instructions 


Load Byte 

Load Byte Unsigned 
Load Halfword 

Load Halfword Unsigned 
Load Word 

Load Word Left 

Load Word Right 


Store Byte 

Store Halfword 
Store Word 
Store Word Left 
Store Word Right 


Arithmetic Instructions 
(ALU Immediate) 


Add Immediate 

Add Immediate Unsigned 

Set on Less Than Immediate 

Set on Less Than Immediate 

Unsigned 

AND Immediate 

OR Immediate 

Exclusive OR Immediate BGEZAL 


Load Upper Immediate 


Arithmetic Instructions SYSCALL 
(3-operand, regIster-type) BREAK 
Add 

Add Unsigned 


LWCz 
Subtract SWCz 


Subtract Unsigned MTCz 


Set on Less Than MFCz 
Set on Less Than Unsigned CTCz 


CFCz 
COPz 
Exclusive OR BCzT 
NOR BCzF 


Shift Instructions 


Shift Left Logical 

Shift Right Logical 

Shift Right Arithmetic 

Shift Left Logical Variable 
Shift Right Logical Variable 
Shift Right Arithmetic Variable 


Table 2. 1DT79R3001 Instruction Summary 
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Multiply/Divide Instructions 
Multiply 

Multiply Unsigned 

Divide 

Divide Unsigned 

Move From HI 

Move To HI 

Move From LO 

Move To LO 


Jump and Branch Instructlons 


Jump 

Jump and Link 

Jump to Register 

Jump and Link Register 

Branch on Equal 

Branch on Not Equal 

Branch on Less than or Equal to Zero 
Branch on Greater Than Zero 
Branch on Less Than Zero 

Branch on Greater than or 

Equal to Zero 

Branch on Less Than Zero and Link 


Branch on Greater than or Equal to 
Zero and Link 


Special Instructions 


System Call 
Break 


Co-processor Instructions 


Load Word from Co-processor 
Store Word to Co—processor 
Move To Co-processor 

Move From Co—processor 

Move Control to Co-processor 
Move Control From Co-processor 
Co-processor Operation 

Branch on Co-processor z True 
Branch on Co-processor z False 


System Control Co-processor 
(CPO) Instructions 


Move To CPO 

Move From CPO 

Read indexed TLB entry 
Write Indexed TLB entry 
Write Random TLB entry 
Probe TLB for matching entry 
Restore From Exception 
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Used with Exception Processing 
[J Used with Virtual Memory System 


Figure 12. System Control Co—processor (CPO) Registers 


CPO Registers 


As aco-processor, CPO has a number of registers which it uses 
to perform its control functions. These include 64 fully associative 
Translation Lookaside Buffers (TLBs), used to manage the virtual 
memory space; registers to manage the TLB set; and the excep- 
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Indicates nature of last exception 


Exception Program Counter—contains 
address of instruction which detected 
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Context Pointer into the kernel’s virtual Page 
Table Entry ee aay 


Most recent bad | Most recent bad virtual address | address 


Processor revision identification 


tion handling registers. Figure 12 illustrates the register set of the 
System Control Co-processor. Table 3 provides a brief explana- 
tion of the function of each of these registers. A more detailed ex- 
planation of the use of each of these registers is included in the 
“mips RISC Architecture” manual. 






Table 3. CPO Registers 
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Memory Management System 


The IDT79R3001 supports a virtual memory system, so that 
each task in agiven application can be unaware of the addressing 
needs of other tasks. This is also useful in systems with limited 
physical memory; the IDT79R3001 provides for the logical expan- 
sion of memory by translating addresses composed in a large vir- 
tual space into available physical memory addresses. 


IDT79R3001 Operating Modes 


The IDT79R3001 has two operating modes: User Mode and Ker- 
nel Mode. The IDT79R3001 normally operates in the User Mode 
until an exception is detected, forcing it into the Kernel Mode. The 
processor remains in Kernel Mode until the exceptions are han- 
dled and the processor executes an RFE (Return from Exception) 
instruction, which will restore it to User Mode. Kernel Mode allows 
software to alter machine state information such as that contained 
in the CPO registers; that is, if in User Mode an access is attempted 
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to Co-processor 0 and the Kernel has not enabled the User to ac- 
cess the co—processor, an exception will occur. Similarly, ifa User 
task attempts to use a Kernel virtual address, an exception will oc- 
cur. Thus, system resources are protected from User tasks. 

The manner in which memory addresses are translated 
(mapped) depends on the operating mode of the IDT79R3001 and 
on the virtual address desired. Figure 13 illustrates the virtual ad- 
dress mapping performed by the IDT79R3001: 

User Mode - in this mode, a single, uniform virtual address 
space (kuseg) of 2 Gbyte is available to each user task (tasks are 
further identified by a 6—bit process identifier field in order to form 
unique virtual addresses). All references to this segment are 
mapped using the TLB, which utilizes both the virtual address and 
the Process ID field to perform the virtual-to—physical mapping 
(note that this allows the cache to be shared by up to 64 User proc- 
esses at a time without requiring time consuming Cache or TLB 
flushing). 


MMU ADDRESS TRANSLATION 


VIRTUAL 


Oxffffffff 
Kernel Mapped 


0xc0000000 


$ 

ES 

i: 

Kernel Uncached 

0xa0000000 (ksegt) | 
Kernel Cached 

(ksegO) i 

0x80000000 : 


User Mapped 
Cacheable 


0x00000000 


Kernel Mode — Four separate segments are accessible through 
this mode: 

¢ kuseg— When in the Kernel Mode, references to this segment 
are treated just like User Mode references, thus streamlining Ker- 
nel accesses to User memory. 


e kseg0— References to this 512 Mbyte segment may use the 
cache memory, but are not translated by the TLB. Instead, these 
addresses map directly to the first 512 Mbytes of the physical ad- 
dress space. Note that many dedicated embedded applications 
will utilize this address space and kseg1 only, rather than any of the 
TLB mapped segments. 

¢ kseg1 — References to this 512 Mbyte segment are not 
mapped through the TLB. Additionally, this memory is viewed as 
uncacheable, which means that references through this segment 
will always use the asynchronous memory interface. As with 
ksegO, references through this segment are hard—mapped to the 
first 512 Mbytes of physical memory. When the processor boots, 





PHYSICAL 


Physical 
Memory 


3548 MB 


the reset vector is contained in this segment, so that the processor 
does not require either the cache or the TLB to be valid at RESET 
time. 

© kseg2 — References to this 1 Gbyte segment are always 
mapped through the TLB. As with kuseg, the ability of memory 
pages to be cached is determined by a bit setting in the TLB entry 
for that page. 


The Translation Lookaside Buffer (TLB) 


The translation of virtual addresses in either kuseg or kseg2 
(mapped segments) is performed by the on-chip Translation 
Lookaside Buffer array. This array consists of 64 fully-associative 
(content addressable) memory elements. Each entry maps a 
4Kbyte virtual page to a 4Kbtye physical page. Each TLB entry 
contains other information about the virtual address it maps (such 
as which User process it maps) and also about the physical ad- 
dress (such as whether it is cacheable or writeable). 
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EntryHi 


VPN-Virtual Page Number 
TLBPID — Process Identifier 

PFN — Physical Frame Number 

N —Non-cacheable Physical Page 


EntryLo 


D — Dirty Page / Write Protect 

V - Valid TLB Entry 

G — Global! translation (ignore PID) 
0 — Reserved 


Figure 14. TLB Entry Format 


Figure 14 illustrates the format of each TLB entry. The transla- 
tion operation is illustrated in Figure 15. The upper portion of the 
desired virtual address is compared against the VPN field of each 
TLB entry. Additionally, the current process ID (contained in the 
TLBHI register) is matched against the PID field of the TLB entry (if 
the TLB entry is marked as Global, the PID comparison is ignored). 
If a match occurs, and the TLB entry is marked as Valid, then the 
translation is completed by replacing the VPN of the virtual ad- 
dress with the corresponding PFN (Physical Frame Number). 

Note that the use of the TLB does not incur an execution penalty, 
since the execution engine pipeline includes stages to cover for 
the time required to make the TLB search and translation. 

TLB misses occur when no successful match occurs. These 
events are handled in software. The CPO registers give the soft- 


Current 
Process ID 
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ware enough information to obtain the appropriate TLB entry at 
speeds which exceed those achieved by many CPUs which use 
hardware TLB replacement (10-12 cycles under UNIX). 

When a TLB miss occurs, the address of the instruction which 
was executing is stored in the EPC register, and the BadVA regis- 
ter contains the address which was being translated. The Context 
register uses the BadVA value to generate a direct pointer to the 
kernel Page Table Entry for the desired virtual address. The Ran- 
dom register suggests the TLB entry to be replaced by the new en- 
try. Note that the lower eight TLB entries are not pointed to by Ran- 
dom; the kernel software can thus insure that it is constantly 
mapped, and deterministic response is guaranteed. 


Program Counter 


Virtual 
Address 


Physical 
Address 


Figure 15. Virtual to Physical TLB Translation 


BACKWARD COMPATIBILITY WITH 
IDT79R2000A AND 79R3000 PROCESSORS 


The IDT79R3001 can execute the same binary software (either 
kernel or user) that is executed by either the IDT79R2000A or 


IDT79R3000. At the system level, some hardware re—design is 
necessary to achieve the cost savings inherent in the IDT79R3001 
hardware interface. 
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PIN DESCRIPTIONS 7 
| PINNAME| VOT CESCRIPTION, 
FT LCT ae ne 







Memory Interface 


Data (0:31 VO A 32-bit bus used for all instruction and data transmission among the processor, synchronous memory space, asynchronous 
ata (0:31) memory space and co-processors. 


, A 4-bit bus containing even parity over the data bus. If parity checking is enabled, a parity error will cause the PErr signal to 
DataP (0:3) \/O | be asserted and a cache—miss to occur. Regardless of whether parity checking is enabled, the processor will always gener- 
ate parity on writes. 


Taq (13:31 vO A 19-bit bus used for transferring cache tags and high-order address bits between the processor, caches and asynchro- 
ag (13:31) nous memory spaces. 


| AddrLo (0:23) |_© | A 24-bit bus containing low-order byte addresses for both the synchronous (cache) and asynchronous memory spaces. __| 
Synchronous Memory Control 

| © | The output enable for the instruction cache. The polarity ofthis signal is selectable, 
| © | The write enable for the instruction cache. The polarity ofthis signalis selectable, 
[0 | The instruction cache address atch clock, The ockruns continuous. 
}| © | The output enable for the data cache. The polarity ofthis signalis selectable, 
eu 
| | 


















The write enable for the data cache. The polarity of this signal is selectable. 
The data cache address latch clock. The clock runs continuously. 


A high on this signal indicates that the Tags just read from the cache are valid. When a cache update occurs, the processor 
will generate the appropriate Valid bit. 









if parity checking is enabled, this signal is an active low output of the internal CPO parity error status bit. It is driven low, 
when a parity error is detected and remains low until software clears the parity error flag in the status register. This pin is 
physically the same pin as AccTyp2. Its function is selected during device reset. 


Asynchronous Memory Interface 


Xen | The transceiver enable for the read buffer. 


A 3_bit bus used to indicate the size of data being transferred on the asynchronous memory bus, whether or not a data 
transfer is occurring and the purpose of the transfer. If parity checking is enabled, AccTyp2 becomes the PErr signal. 


TERA | o | signals ho oosuranceofen agrehvonausmemoywiisgde 
[Rem] [0 | signals to oosurance ofan agndhvonsve memay ada. —SSCSCSCSCS~—SCSCSCSY 
| Signals the occurance ofa bus evar uring an asyntvoncus menoy waned SCSSC—C* 
[0 | indicates whether the pacessorisinaRUNorSTALLsai. SSCS 
rel 
al 



















| 


Indicates the instruction about to commit processor state should be aborted and other exception related information. 
A clock derived from the internal processor clock used to generate the system clock. 


The asynchronous memory read stall termination signal. !n most system designs, RdBusy is normally asserted and is deas- 
serted only a indicate the successful completion of the memory read. RdBusy is sampled by the processor only during mem- 
ory read stalls. 



















The co—processor busy stall initiation/termination signal. 


CpBusy 
A 4-bit bus used to transfer conditional branch status from the co—processors to the CPU. CpCond(0) is used to control 


Cpeene (0's) eg whether or not a cache burst refill occurs; the other signals are used as input port pins for co—-processor branch instructions. 


Processor Control Signals 


DMAStall DMA Stall. Signals to the processor that it should stall accesses to the synchronous memories and tri-state the synchro- 
a nous memory interface. 
A 6-bit bus used to signal maskable interrupts to the CPU. A reset time, mode values are sampled from this bus to initialize 
(0:5) 


nt (0: a the processor. During normal operation, these signals are not latched by the processor and must remain asserted until the 
Clk2xSys fa The master double frequency input clock, used to generate SysOut. 









processor acknowledges the interrupt (through software) to the interrupt source. 

A double frequency clock input used to determine the sample point for data coming into the CPU and co-processors and 
Clk2xSmp/Rd used to determine the enable time of the synchronous memory RAMs. 
Clk2xPhi 


A double frequency clock input used to determine the position of the two internal phases. 
Reset 


Initialization input used to force execution starting from the reset memory address. Reset should be asserted asynchronously 
but must be negated synchronously with the leading edge of SysOut. 
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A a a A ar Pa wr ve RII 2 a HA ST TT LS STE TE 1 SE OS 


ABSOLUTE MAXIMUM RATINGS": ®*) RECOMMENDED OPERATING 


UNIT) TEMPERATURE AND SUPPLY VOLTAGE 


Vrerm | With Respectto | 9.5 to +7.0 -0.5 to +7.0 V TEMPERATURE 
GND mapa ene 0°C to +70°C 5.0 + 5% 
= meee 


°C 
Temperature PACE OUTPUT LOADING FOR AC TESTING 
°C 


Storage 
TsTG Temperature) | ~55to+125 | —65to +150 


Input Voltage —0.5 to +7.0 -0.5 to +7.0 


NOTES: 


1. Stresses greater than those listed under ABSOLUTE MAXIMUM RAT- 
INGS may cause permanent damage to the device. This is a stress rating 
only and functional operation of the device at these or any other conditions 
above those indicated in the operational sections of this specification is not 
implied. Exposure to absolute maximum rating conditions for extended 
periods may affect reliability. 


















To Device 
Under Test 









2. VIN minimum = -3.0V for pulse width less than 15ns. 
VIN should not exceed Vcc +0.5 Volts. 
3. Not more than one output should be shorted at a time. Duration of the short 


DC ELECTRICAL CHARACTERISTICS— 


COMMERCIAL TEMPERATURE RANGE (Ta = 0°C to +70°C, Vcc = +5.0V + 5%) 


16.67MHz 20.0MHz 
SYMBOL PARAMETER TEST CONDITIONS MIN. MAX. | MIN. MAX. Pe UNIT 
eee eee ee 


Vec = Min., lon = —4mA 

| Vor | Output Low voltage Vec=Min..ton=4ma_| — 04 | — 04 | — 04 | Vv 

Vec=Min.lon=-8ma | 24 — | 24  — | 24  — | v 

| Vin | nputHHiGH Voltage) | 

Fvu | tnputtowvotage sf CS 
eee ate 


















| 40 | 

: | = 08 | 

| 200 | 

| — oe | 

| 300 

p — od 

ji InputHiGH Leakage) | V=Voo 

in Input LOW Leakage () 
NOTES: 


1. Vit Min. = ~3.0V for pulse width less than 15ns. Vit should not fall below —0.5 Volts for longer periods. 
2. Vids and Vics apply to Clk2xSys, Clk2xSmp/Rd, Clk2xPhi, CpBusy, and Reset. 

3. These parameters do not apply to the clock inputs. 
4 


10 [oF | 






—40 40 


Pes eae 
na ee 
eae 
es 
ass | 80 

T= | =o P10 [a 
aS ee 
0 


. VOHT and VoLT apply to the bidirectional data and tag buses only. Note that Vin and Vi. also apply to these signals. VoHT and VOLT are supplied as 
additional information to help the system designer understand the relationship between current drive and output voltage on these pins. 


5. ViH should not be held above Vcc + 0.5 volts. 

6. The |DT79R3001 contains an internal pull-up/current source on the TAG pins to facilitate initialization. This current source is disconnected when Reset 
is inactive. 

7. Guaranteed by design. 

8. VOHC applies to Run and Exception. 
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AC ELECTRICAL CHARACTERISTICS—™) 
COMMERCIAL TEMPERATURE RANGE (Ta =0°C to +70°C, Voc = +5.0V + 5%) 


fsrweor] raraweven | Tesroonoons [ft [Mi Lk Mi | NT 

Clock 

Tortigh [Input Clock High® | Transition<sns_ J 126 = — | 10 =— | 8  — | ns | 

Textow Transition<éns_[ 125 — | to — | @  — | ns | 
[30 s00 | 25 500 |_20 500 | _ ns _| 

Tex | 0 TeycsAf 0 Toye] 0 Teyci4| ns _ 
| 9 Toye] 7 Toyo] 5 Toye} ns_| 

Run Operation 

Toen 

Toois : 

Toval Load = 25pF a ee ee 

Twidly Load = 25pF ee ee ee ee 

Tos 

Tor 























Tosi | 
Tacty Load = 25pF 
Tate Load = 25pF an 
Tw Load = 25pF 17 118 
Texe _| Exception Load = 25pF ae a ee es ee 


Stall Operation 
Address Valid Load = 25pF p— so | — 
Load = 25pF = 7 | = 28 | 


Address Type 
Load = 25pF 1 27 1 23 


Load=250F | — 7 | ~ 7 | 


— 
w |: 
| 





Tsaval 
TsAcTy 
Tmrdi 
Turd 
Tsu 
TRun 
TsmMWr 


TsEc Exception Valid 
Tpmavis | DMA Drive On 


Tpmaen | DMA Drive Off 
Reset Initialization 


Reset Pulse Width 
Reset Pulse Width, Pull-downs 
on Tag 





Memory Read Initiate 
Read Terminate 


iene. (ok fe 
ioad=25F | 1 a7 | 3 | 1 18 | 
load=25F | — 20 | — a | — 15 | 


Run Terminate 
Run Initiate 
Memory Write 


Trst 
TRSTTAG 


—s 
Bey 
fo) 
| 
=_ 
ms 
° 
| 
—s 
Bb 
oO 
| 
oi 
~ 


Capacitive Load Deration 


Load Derate® Pos tos tt fost sia pF 


NOTES: 

1. All timings are referenced to 1.5V. 

. The clock parameters apply to all three 2x Clocks: Clk2xSys, Clk2xSmp/Rd and Clk2xPhi. 
. This parameter is guaranteed by design. 

. These parameters are illustrated in detail in the "IDT79R3001 Hardware Interface Guide”. 





. Tcyc is one CPU clock cycle (2 cycles of a 2x clock). . 
. With the exception of Run, no two signals of a given device will derate by a difference greater than 15%. 


an kwh 
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IDT79R3001 RISController 


PIN CONFIGURATIONS 
172-Pin Ceramic Flatpack (Cavity Side View) 





f 
@ 


Data21 
Data22 
Data24 
Data25 
Data26 
Data31 
DataP3 
Data27 
Data28 
XEn 

Data29 
Data30 
Exc 


Clk2xPhi 
GND7 

GND6 

CpCond2 

VCC7 

VCC6 

GND5 

GND4 

GND3 

VCC5 IDT79R3001 RISController 
VCC4 

VCC3 

GND2 

GND1 


Clk2xSy 
Cpsync 
MemWr 
AccTy1 
Run 
vcc2 
VCC1 
Cik2xSmp/Rd 
SysOut 
DClk 

ond3 
MemRd 
AccTyO 
AccTy2 
DmAStall 






Ss 


_ 


EEE 


= 
a 


AdrLo23 
AdrLoz2 CI 
AdrLo21 C4 
AdrLo20 [— 
AdrLoi9 (74 
AdrLo1s C4 


Note: 
1. AccTyp2 is redefined to be Parity Error if the parity enable option is selected at device initialization. 
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foe] 
NJ 

























Adrlo17 
Into 
Int 








Int2 
int3 
Intd 








— 
dO 
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IDT79R3001 RiSController COMMERCIAL TEMPERATURE RANGE 


PIN CONFIGURATIONS (continued) 
144—-Pin PGA (Top View) 


1 2 3 4 5 6 7 8 9 10 11 12 14 

VCC14] AdrLo} AdrLo} AdrLo |VCC12] AdrLo} AdrLo as AdrLo]} AdrLo| Int2 Wr_ | Reset |VCC10 
6 10 11 14 15 16 17 Busy 

AdrLo Mem AdrLo | AdrLo me rg ae Int? | Int3 Cp Bus Tag13 | Tagi6 

3 Wr 7 9 Sas Busy | Error 

fae AdrLo | VCC13 Aah pace GND13] GND12} VCC11} Into | Int4 Rd Tag14 | Tagi7 | Tag20 
4 Busy 

Adio Tagto | Tag2t 
2 

AdrLo Tagi8 | Tag22 | VCC9 

1 

7 

Data | GND1 GND9 Tag24 | Tag26 
3 

Pelee IDT79R3001 RISController voce | T2928 | Taa7 

rae Tag31 Tag29 

9 

Data | GND2 GND8 | AdrLo | Tag30 
11 19 

VCC1 |} Data | Data ‘ i AdrLo 
12 17 


20 
Data } Data | DataP 
13 16 2 23 


Data | Data | Data | GND3 ear nee VCC3 | VCC4 | GNDS | GND6 Mem DmA Adr 

14 18 19 Stall 

Data ee A AccTy1] Data ae Data | XEn | Data | Clk2x Che DCk Cp |AccTyo DWr 

23 22 27 30 Sys [Smp/Rd Cond3 

VCC2 | Data Data | Data | GND4; Data | Excep | Clk2x | Cp _ |SysOut| VCCS5 | ICIk |AccTy2] VCC6 
21 31 28 29 | tion | Phi | Cond2 


Note: 
1. AccTyp2 is redefined to be Parity Error if the parity enable option is selected at device initialization. 






> 
=a 
oe 

° 


GND7 | AdrLo 


< 
QO 
Q 
“J 






RE 
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Tcklow Tckp 


Clk2xSys = 
Tsmp 
: Trd 
Clk2xSmpRd a 


Tsys 
Clk2xPhi 


Figure 16. Input Clock Timing 


SysOut 


Rd/SmpOut* 








* These signals are not actually output from the processor. They are drawn to provide 
a reference for other timing diagrams. 


Figure 17. Processor Reference Clock Timing 
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2 
Phase 


Tsys Tsys 
Tes —— | — | 
AddrLo DAddr 9K Addr = XD Addr 


AccTyp 0:1 


AccTyp 2 


‘Data and 
Tag Busses 


Sta 





Figure 18. Synchronous Memory (Cache) Timing 
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Phase 
SysOut 


PhiOut 


AddrLo 
Tag 


(Address 
High) 


AccTyp 0:1 


AccTyp 2 


Data 
(Output) 


MemWr 


WrBusy 







a 


D Addr 


Data Size Reserved 


eine Tsacly fo 
Reserved 

ice 

rae 

| 

cS 


\Z 














Figure 19. Memory Write Timing 
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Phase 

SysOut a Tsy ooh. Tsys 
pm = ic 

PhiOut 


AddrLo D Addr | Addr 


Tag 


my 











(Address cl _ PKL E> 
High) 
. a— Tacty 
AccTyp 0:1 DP Data Size DataSize | > 
Tsacty 
Tat2 
AccTyp 2 Tsacty 
on a . HY 
Data -><_Dt+<_D Gp, 
is ie 
a) ~2a0 
a a ee a 
oa ae es 
CpCondo fo CK CK 
ee ec = = 
run}<t—? 
Tstl 
Fin > 


Figure 20. Memory Read Timing 
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Data Bus 


Run 


CpBusy 


Exception 


CpCond(n) 


Co-—Processor Store Co-Processor Load 





Condition 
Valid 


Figure 21. Co-Processor Load/Store Timing 
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1 2 1 2 1 
Phase 


SysOut 


PhiOut 








Figure 22. Interrupt Timing 
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“ 
N 
— 
N 
“ 
— 
N 
——ne 
I 
% 6 7 SBS {Ee |e 
2B? 2 |g 
a |& f& = If ile 


NOTES: 

1. Reset must be negated synchronously; however, it can be asserted asynchronously. Designs should not rely on the proper functioning of SysOut prior 
to the assertion of Reset. 

2. If Phase-Lock On or R3000 Mode are asserted as mode select options, they should be asserted throughout the Reset period, to insure that the slowest 
co—-processor in the system has sufficient time to lock the CPU clocks. 

3. Resetis acturally sampled in both Phase 1 and Phase 2. To insure proper initialization, itis recommended that Reset be negated relative to the end of 
Phase 1. 


Figure 23. Mode Vector Initialization 
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rl 


phase 


SysOut 


PhiOut 


DMA Stall 


Wr 
XEn 
IClk 


DClk 


AdrLo 


1 2 1 2 





tsys le tsys l+-->} 






tDMADis 


Figure 24. Entering DMA Stall 
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phase 


SysOut 


| 


DMA Stall ee 
<4 


tos 





tsp 


| 
Cc 
a 





R 


Oo 
Qa 


z tDMADis 


= 1s} 
[ou — 


=~ 


x 
ay 


DClk 





Figure 25. Completing DMA Stall 
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IDT79R3001 RiSController 


ORDERING INFORMATION 


DT eRe RM a 
Device Type Speed Package Process/ 
Temp. 

Range 


| 
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Blank 


G 
F 


16 


20 
25 


79R3001 


COMMERCIAL TEMPERATURE RANGE 


Commercial (0°C to +70°C) 


144-Pin PGA 
172-Pin Flat Pack 


16.67 MHz 


20.0 MHz 
25.0 MHz 


RiSController 


RISC FLOATING-POINT IDT79R3010 
ACCELERATOR (FPA) 


Integrated Device Technology, Inc. 





FEATURES: e Pin, function and software compatible with the IDT79R2010A 
a RISC FPA 
: Rapes Satie of Single- and Double-Precision e Military product compliant to MIL-STD-883, Class B 


e 32-bit status/control register providing access to all IEEE- 
Standard exception handling 
e Load/store architecture allows data movement directly 


— Floating-Point Add 
— Floating-Point Subtract 


— Floating-Point Multiply between FPA and memory or between CPU and FPA 

— Floating-Point Divide e Overlapped operation of independent floating point ALUs 
— Floating-Point Comparisons 

— Floating-Point Conversions DESCRIPTION: 


e Sustained performance: 
-— 9MFLOPS single precision LINPACK 
~ 6 MFLOPS double precision LINPACK 


The IDT79R3010 Floating-Point Accelerator (FPA) operates in 
conjunction with the IDT79R3000 Processor and extends the 
IDT79R3000’s instruction set to perform arithmetic operations on 


° Cycle Time: values in floating-point representations. The IDT79R3010 FPA, 
— 30ns (33.33MHz) with associated system software, fully conforms to the require- 
- 40ns (25MHz) ments of ANSI/IEEE Standard 754-1985, “IEEE Standard for Bi- 
— 60ns (16.67MHz) nary Floating-Point Arithmetic.” In addition, the architecture fully 
— 80ns (12.5MHz) supports the standard’s recommendations. 
¢ Direct, high-speed interface with IDT79R3000 Processor This data sheet provides an overview of the features and archi- 
Floating-Point Specification scription of the operation of the device is incorporated in the 
e Full 64-bit operation using sixteen 64-bit data registers 3000 Family Hardware User's Manual’, and a more detailed at 
; ™ chitectural overview is provided in the “mips RISC Architecture 
& Highspeed CEMOS: technology book, both available from IDT. 









CACHE 


DATA DATA BUS 





OPERANDS 
REGISTER UNIT (16 X 64) 
EXPONENT PART FRACTION 











INSTRUCTIONS 








A B RESULT 
EXPONENT UNIT 


z RSULT 
DIVIDE UNIT 


(53) x a (56) 


A RESULT 









CONTROL 
UNIT 
& 
CLOCKS 












PLLOn MULTIPLY UNIT 


Figure 1. 1DT79R3010 Functional! Block Diagram 


CEMOS is a trademark of Integrated Device Technology, Inc. 
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IDT79R3010 
RISC FLOATING POINT ACCELERATOR (FPA) 


IDT79R3010 FPA REGISTERS 


The IDT79R3010 FPA provides 32 general purpose 32-bit regis- 
ters, a Control/Status register, and a Revision Identification regis- 


General Purpose Registers 
(FGR/FPR) 


6 


FGR3 FGR2 


3 32.31 0 
FGRI 


MILITARY AND COMMERCIAL TEMPERATURE RANGES 


ter. The tightly-coupled coprocessor interface causes the register 
resources of the FPA to appear to the systems programmers as an 
extension of the CPU internal registers. The FPA registers are 
shown in Figure 2. 


Control/Status Register 


31 0 
Exceptions/Enables/Modes 


Implementation/Revision 
31 Register 0 


aes 





Figure 2. IDT79R3010 FPA Registers 


Floating-point coprocessor operations reference three types of 
registers: 


e Floating-Point Control Registers (FCR) 
e Floating-Point General Registers (FGR) 
e Floating-Point Registers (FPR) 


Floating-Point General Registers (FGR) 


There are 32 Floating-Point General Registers (FGR) on the 
FPA. They represent directly-addressable 32-bit registers, and 
can be accessed by Load, Store, or Move Operations. 


Floating-Point Registers (FPR) 


The 32 FGRs described in the preceding paragraph are also 
used to form sixteen 64-bit Floating-Point Registers (FPR). Pairs 
of general registers (FGRs), for example FGRO and FGR1 (refer to 
Figure 2) are physically combined to form a single 64-bit FPR. The 
FPRs hold a value in either single- or double-precision floating- 
point format. Double-precision format FPRs are formed from two 
adjacent FGRs. 


Floating-Point Control Registers (FCR) 


There are 2 Floating-Point Control Registers (FCR) on the FPA. 
They can be accessed only by Move operations and include the 
following: 


e Control/Status register, used to control and monitor 
exceptions, operating modes, and rounding modes; 


e Revision register, containing revision information about the 
FPA, 


COPROCESSOR OPERATION 


The FPA continually monitors the IDT79R3000 processor in- 
struction stream. If an instruction does not apply to the 
coprocessor, it is ignored; if an instruction does apply to the 
coprocessor, the FPA executes that instruction and transfers nec- 
essary result and exception data synchronously to the 
IDT79R3000 main processor. 


The FPA performs three types of operations: 
e Loads and Stores; 
e Moves; 
e Two- and three-register floating-point operations. 


Load, Store, and Move Operations 


Load, Store, and Move operations move data between memory 
or the IDT79R3000 Processor registers and the IDT79R3010 FPA 
registers. These operations perform no format conversions and 
cause no floating-point exceptions. Load, Store, and Move opera- 
tions reference a single 32-bit word of either the Floating-Point 
General Registers (FGR) or the Floating-Point Control Registers 
(FCR). 


Floating-Point Operations 


The FPA supports the following single- and double-precision for- 
mat floating-point operations: 
e Add 
e Subtract 
e Multiply 
e Divide 
Absolute Value 
Move 
Negate 
Compare 
In addition, the FPA supports conversions between single- and 
double-precision floating-point formats and fixed-point formats. 
The FPA incorporates separate Add/Subtract, Multiply, and Di- 
vide units, each capable of independent and concurrent operation. 
Thus, to achieve very high performance, floating point divides can 
be overlapped with floating point multiplies and floating point addi- 
tions. These floating point operations occur independently of the 
actions of the CPU, allowing further overlap of integer and floating 
point operations. Figure 3 illustrates an example of the types of 
overlap permissible. 
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Only Load, Store, and Move operations 
are permitted in FPA during these cycles. 


Other FPA instructions can proceed during 
these cycles. However, two multiply or two 
divide operations cannot be overlapped. 


These cycles are free for integer opera- 
tions inthe CPU. 





Figure 3. Examples of Overlapping Floating Point Operation 


Exceptions INSTRUCTION SET OVERVIEW 
The IDT79R3010 FPA supports all five IEEE standard All IDT79R3010 instructions are 32 bits long and they can be di- 
exceptions: vided into the following groups: 
e Invalid Operation e Load/Store and Move instructions move data between 
e Inexact Operation memory, the main processor and the FPA general registers. 
e Division by Zero . e Computational instructions perform arithmetic operations on 
e Overflow floating point values in the FPA registers. 
e Underflow e Conversion instructions perform conversion operations 
The FPA also supports the optional, Unimplemented Operation between the various data formats. 
exception that allows unimplemented instructions to trap to soft- e Compare instructions perform comparisons of the contents of 
ware emulation routines. registers and set a condition bit based on the results. The 
The FPA provides precise exception capability to the CPU; that result of the compare operation is output on the FpCond 
is, the execution of a floating point operation which generates an output of the FPA, which is typically used as CpCondi on the 
exception causes that exception to occur at the CPU instruction CPU for use in coprocessor branch operations. 
which caused the operation. This precise exception capability is a Table 1 lists the instruction set of the IDT79R3010 FPA. 


requirement in applications and languages which provide a 
mechanism for local software exception handlers within software 
modules. 


Load/Store/Move Instructions Computational Instructions 
Load Word to FPA ADD.fmt Floating-point Add 

Store Word from FPA SUB.fmt Floating-point Subtract 

Move Word to FPA MUL.fmt Floating—point Multiply 

Move Word from FPA DIV.fmt Floating-point Divide 

Move Control word to FPA ABS.fmt Floating—point Absolute value 


Move Control word from FPA MOV.fmt Floating—point Move 
NEG.fmt Floating-point Negate 


Conversion Instructions Compare Instructions 
CVT.S.fmt Floating-point Convert to Single FP C.cond.fmt | Floating-point Compare 
CVT.D.fmt Floating-point Convert to Double FP 
CVT.W.fmt | Floating-point Convert to fixed-point 





Table 1. IDT79R3010 Instruction Summary 
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IDT79R3010 
RISC FLOATING POINT ACCELERATOR (FPA) 


IDT79R3010 PIPELINE ARCHITECTURE 


The IDT79R3010 FPA provides an instruction pipeline that par- 
allels that of the IDT79R3000 processor. The FPA, however, has a 
6-stage pipeline instead of the 5-stage pipeline of the 
IDT79R3000: the additional FPA pipe stage is used to provide effi- 
cient coordination of exception responses between the FPA and 
main processor. 

The execution of a single IDT79R3010 instruction consists of six 
primary steps: 

1) IF—Instruction Fetch. The main processor calculates the 
instruction address required to read an instruction from the 
I-Cache. No action is required of the FPA during this pipe 
stage since the main processor is responsible for address 
generation. 

2) RD—The instruction is present on the data bus during phase 
1 of this pipe stage and the FPA decodes the data on the bus 
to determine if it is an instruction for the FPA. 


MILITARY AND COMMERCIAL TEMPERATURE RANGES 


3) ALU—If the instruction is an FPA instruction, instruction 
execution commences during this pipe stage. 


4) MEM—TIf this is a coprocessor load or store instruction, the 
FPA presents or captures the data during phase 2 of this pipe 
stage. 

5) WB—The FPA uses this pipe stage solely to deal with 
exceptions. 

6) FWB—The FPA uses this stage to write back ALU results to 
its register file. This stage is the equivalent of the WB stage 
in the IDT79R3000 main processor. 

Each of these steps requires approximately one FPA cycle as 
shown in Figure 3 (parts of some operations spill over into another 
cycle while other operations require only 1/2 cycle). 


Instruction Execution 










| 


one cycle 


| cache | RF | op | D-Cache | 





W 


B 


Figure 4. IDT79R3010 Instruction Summary 


The IDT79R3010 uses a 6-stage pipeline to achieve an instruc- 
tion execution rate approaching one instruction per FPA cycle. 





a 


Instruction 
Flow 


SBBISBESEETITT 
| iF | rp | ALU] Mem! we | Fwe | 
|e | RD | ALU] MEM| we | Fe | 

COBCSSSCRSESES 
| IF | Ro | ALU | MEM] we | Fwe | 


Pe | no [ALU] MEM] we | FW 


Thus, execution of six instructions at a time are overlapped as 
shown in Figure 5. 








Current 
Cycle 


Figure 5. IDT79R3010 Instruction Pipeline 


This pipeline operates efficiently because different FPA re- 
sources (address and data bus accesses, ALU operations, regis- 
ter accesses, and so on) are utilized on a non-interfering basis. 
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PIN CONFIGURATION 







































(Top View) 
SES _ SETS ST Ss. ~ F2EE. 

Qa , an ane Sood 
seegQSessteses§oges 28 
gagananNsMoangangAaAnnAaAAaAsGdaqaaa 
PUT 
111098765 43 2 1 848 

Clk2xRd [J 12 aN GND13 
FpSysin 13 DataP (1) 
Data (31) 14 VCC12 
Vcc 15 GND12 
GND1 FpCond 
. DataP (3) FpBusy 
FpSysOut Fplnt 
Clk2xSys Exception 
Clk2xSmp Run 
oe 84-Pin J Bend CERQUAD bgoter 
FpSync vcci1 
VCC2 GND11 
GND2 VCC10 
Vcc3 GND10 
GND3 FpPresent 
PLLOn Resvd0 
VCC4 VCcg9 
GND4 GND9 
VvcCcs vccs 
GND5 GND8 
HUUUUC i VOU UUUU 
SEMlOAZQTeelse aero FNS Te 
eseeGSeseesan2--§3—-—- 7 
Cw OM onc a foe £8 £S ES 
qaqa qqgqgsaogs SE Be 
Note: 


Reserved pins must not be connected. 
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TA RY a IS EY ET EN AT 78 SE IA AEE OE 2A DY OSS LETS TY AIT SS RE ED ESI ESE ET A DET TE EGE NET IOS POO SCENE AEN LY A TT EAE ELIE SERGE I EL ET NES ES FOLATE LD ES 


PIN CONFIGURATION 
(Ceramic, Cavity Down)— BOTTOM VIEW 






Vss Vec | Data | DataP} yes FP | Fpint | Vss | Run | Rsrvd} vee Vss 
17 1 Cond 1 


















M 

L Data Data Data | Data Vcc |FPBusy Vee FP Data Data 
21 20 18 16 Present} 45 14 

K Veco | Data Rsrvd | Vee Vss 

19 0 

J Data | Data Data 
23 22 13 

H Data | DataP Data | Data 
24 2 11 10 
Data 

, jogs | oye: 84—Pin Ceramic Pin Grid Array 

F Vs V Data | Data 

8 9 
E 


Data | Data 
27 28 


Data | Data 
29 30 


Data | DataP 
7 0 


Data Data 
5 6 


og | vee | ves 
Data 
FP | Data 
8 9 10 11 12 









Fp DataP 
Sysin 3 
Clk2x 
Smp 


5 





B Voc | Ck2x | Yoo | Clk2x 
ve [epee] ve | ee 
. 
4 6 7 


FpSys 
1 2 3 


NOTE: 
1. Reserved pins must not be connected. 
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PIN CONFIGURATION 
84—L QUAD FLATPACK (CAVITY DOWN) 





























TOP VIEW 
~ —~lc Qa 
Bich Si REE 
Glen o PROGR. IE NROlEx Tye 
See does Ss SsRSHSSSSECShEBS 
SIeSSESR6CSE2SESERSESS 
1 
Data (30) Coo. ON Data (0) 
Data (29) Cd) Data (1) 
Data (28) CJ Data (2) 
Data (27) LL) Data (3) 
vcco £_ GND6 
GNDo LH VCC6 
Data (26) CL Data (4) 
Data (25) (7 Data (5) 
Data (24) (7 Data (6) 
DataP (2) (7) Data (7) 
Data (22) (~———"7J] : Data (8) 
Data (21) ("J Data (9) 
Data (20) [7] Data (10) 
vec14 [ Data (11) 
GND14 Co GND7 
Data (19) (“7 VCC7 
Data (18) (7 Data (12) 
Data (17) (____ 7] Data (13) 
Data (16) LL Data (14) 
veoc13 CL Data (15) 
21 
SESRES EES EF SS SR E8888 
Q-00823 ja\= 2a¥%Q0 a8 20202 
SuSsee  S8S59 se s>o>5 
Qa a ag 
NOTE: 


1. Reserved pins must not be connected. 
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PIN DESCRIPTIONS 


DESCRIPTION 
Data (0-31) 
























A multiplexed 32-bit bus used for instruction and data transfers on phase 1 and phase 2, respectively. 
DataP (0-3) A 4-bit bus containing even parity over the data bus. Parity is generated by the FPA on stores. 


Signal to the CPU indicating a request for a coprocessor busy stall 
Signal to the CPU indicating the result of the last comparison operation 
Signal to the CPU indicating that a floating-point exception has occurred for the current FPA instruction. 


Synchronous initialization input used to distinguish the processor-FPA synchronization period from the 
execution period. Reset must be synchronized by the leading edge of SysOut from the CPU. 


Input which during the reset period determines whether the phase lock mechanism is enabled and during the 
execution period determines the output timing model. 


Output which is pulled to ground through an impedance of approximately 0.5k ohms. By providing an external 
pullup on this line, an indication of the presence or absence of the FPA can be obtained. 


FpBusy 
FpCond 


| vo | 

i 

| 0 | i 

|_1 | input to the FPA which indicates whether the processor-coprocessor system is inthe run or stall state. 
Exception Pot | i 

| 0 i 

| 0 | i 

Si 

a 

ze 

aa 

[este 

Loy 

rr 


Input to the FPA which indicates exception related status information. 



























A double frequency clock input used for generating FpSysOut 
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ABSOLUTE MAXIMUM RATINGS": ®) 


SYMBOL RATING COMMERCIAL | MILITARY |UNIT 
Terminal Voltage 
Vtenm [With Respectto | -0.5to+7.0 | -0.5to+7.0 | V 
GND 
Operating 
Ta Temperature 0 to +70 -55 to +125 | °C 
°C 










Ale oat -5510 +125 | -6510 4135 
Storage 
Tsta [Tomperaturel2) | ~85t0+125 | ~6510 +150 


Input Voltage ~0.5 to +7.0 -0.5to+7.0 | V 


NOTES: 


1. Stresses greater than those listed under ABSOLUTE MAXIMUM RAT- 
INGS may cause permanent damage to the device. This is a stress rating 
only and functional operation of the device at these or any other conditions 
above those indicated in the operational sections of this specification is not 
implied. Exposure to absolute maximum rating conditions for extended pe- 
riods may affect reliability. 


2. VIN minimum = -—3.0V for pulse width less than 15ns. 
Vin should not exceed Vcc +0.5 Volts. 


3. Not more than one output should be shorted at a time. Duration of the short 
should not exceed 30 seconds. 


DC ELECTRICAL CHARACTERISTICS — 


MILITARY AND COMMERCIAL TEMPERATURE RANGES 


RECOMMENDED OPERATING 
TEMPERATURE AND SUPPLY VOLTAGE 


[GRADE |sempenavune | OND | Voc 
TEMPERATURE ce 
~55°C to +125°C 5.0 + 10% 


OUTPUT LOADING FOR AC TESTING 


















To Device 
Under Test 





COMMERCIAL TEMPERATURE RANGE (Ta = 0°C to +70°C, Vcc = +5.0 V + 5%) 







Input HIGH Voltage) 


NOTES: 


. These parameters do not apply to the clock inputs. 
. Vie and Vicc apply to Run, PilOn and Exception. 
. VOLFP applies to the FPPresent pin only. 


. Vik and Vins should not be held above Vcc + 0.5 Volts. 
. Guaranteed by design. 


NO OO fh OPM — 


Vor | Ouipit HIGH Votage | Veo=Minig=—ma [as — [as — | 
vor | output ow votage | Voo=Min lo.=4na | — oa |— o4| — 0a | 
[vous | ouput ow Vorage® | Voo=Minoc=isma | — os | — os | — os | 

pee — Teo — Peo 
[Vu | inputLow votaye | __——SC~dt or | — oe — 0 | 
Tvs | input HIGH Votegs@ | _——~ipao —|ao —|[so —| 
Fvus | tnputtow vote | SSS | — on | — 0 | 
"vnc | input tiGH vonage’ | | 4o_— 40 —|40 —| 
Fvuc | tnputLow votage | _———SsSSSSidP os | — 04 | — 0. | 
[cn | inpurCapacitanes™ | ——SSidP = to To | — 10 
cour | oupit Capactanes™ | ___—ip— 10 | — 10 | = 10 | 
Ti. | input LOW Leakage [Vn =GND | 1010 | 1010 | -10_10 | 
oz | Oust Ticstato Leakage | Vou=24V, Va -05v [40 40 | —40 «0 | 4040 | 





ome ac] wan Ma | ONT 
rss — Ta 














. Vit Min. = -3.0V for pulse width less than 15ns. Vit should not fall below —0.5V for longer periods. 
. ViHS and Vis apply to Clk2xSys, Clk2xSmp, Clk2xRd, Clk2xPhi, FpSysin, FoSyne and Reset. 
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DC ELECTRICAL CHARACTERISTICS — 
MILITARY TEMPERATURE RANGE (Ta = 85°C to +125°C, Voc = +5.0 V+ 10% 


Output HIGH Voltage Voc = Min., lod =—4mA 
Output LOW Voltage Vcc = Min., lo. = 4mA 
Votre | Output LOW Voltage(®) Vcc = Min., lo. = 1.5mA 


VIH Input HIGH Voltage) 
Vit 
















Vins 
Vits 
Output Capacitance”) 

Operating Current 
Output Tri-state Leakage 


NOTES: 
. Vit Min. = -3.0V for pulse width less than 15ns. Vit should not fall below —0.5V for longer periods. 


1 

2. Vins and Vis apply to Clk2xSys, Clk2xSmp, Clk2xRd, Clk2xPhi, FpSysin, FpSync and Reset. 
3. These parameters do not apply to the clock inputs. 

4. Vic and Vite apply to Run, PllOn and Exception. 
§ 
6 
7 







. VOLFP applies to the FPPresent pin only. 


ViH, Vite and Vins should not be held above Vcc + 0.5 V. 
Guaranteed by design. 
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AC ELECTRICAL CHARACTERISTICS — (9) 
COMMERCIAL TEMPERATURE RANGE (Ta = 0°C to +70°C, Voc = +5.0 V+ 5%) 


0 MH 
SYMBOL PARAMETER TEST CONDITION] gioeo tt | SC IRL, Mae one ee UNIT 








Clock 
Input Clock High’2) Transition < High 
Input Clock Low(2) Transition < High 

































Input Clock Period 30 1000} 25 1000] 20 

oe Clk2xSys to Clk2xSmp\5) 0 tcyc/4) 0 teyc/4} 0 tcyc/4] 0 
Clk2xSmp to Clk2xRd/>) 0 tcyc/4] 0 tcyc/4] 0 
Clk2xSmp to Clk2xPhi() 9 tcyc/4| 7 tcyc/4| 5 





Timing Parameters 


Trsos | ResetSonup | 
Tos Datasemup dT 
en 
[Trscon [Fp condin sd 
Trea [Fosuy Sd 
Tram | Fomene dT 
Paice 
aaa 
ee 
aera! 
a 










_ ! 
wo _ 


| 
wo 
on 


Reset Initialization 


Reset Timing, Phase—lock on(4- 5) | 3000 — 3000 — 
Reset Timing, Phase-lock off) | | 128 = — | 128 — [128 — | 


Capacitive Load Deration 


| CLD | toadDeratel) | fo st fost 


NOTES: 

All timings are referenced to 1.5V. 

The clock parameters apply to all four 2xClocks: Clk2xSys, Clk2xSmp, Clk2xRd, and Cik2xPhi. 

This parameter is guaranteed by design. 

With PilOn asserted, Reset must be asserted for the longer of 3000 clock cycles or 200 microseconds. 
Teyc is one CPU clock cycle (two cycles of a 2x clock). ; 

No two signals on a given device will derate for a given load by a difference greater than 15%. 


— 35 
as, AG 
10 — 


| 
ie) 
on 


so ]~]| | | 
mimiet: 8 © 









i 









QE Ow = 
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Tekp 

































AC ELECTRICAL CHARACTERISTICS — ‘:3) 
SYMBOL PARAMETER TEST CONDITION i ae 
Input Clock High) Transition < High 
Input Clock Low(2) Transition < High 
Clk2xSys to Clk2xSmp\9) 
Clk2xSmp to Clk2xRd(5) 
Clk2xSmp to Clk2xPhi(5) 
Timing Parameters 
Data Disablel® a eet ee 
Data Valid Load = 25pF 
Reset Set-up 
| Tos | Data Set-up 
Fp Condition 
Fp Busy 
Fp Interrupt 
Fp Move To 
Exception Hold 
un Hold 
Reset Initialization 
Trst Reset Timing, Phase—lock off(5) 
Capacitive Load Deration 
Load Derate(®) 
NOTES: 
1, 
2. The clock parameters apply to all four 2xClocks: Clk2xSys, Clk2xSmp, Clk2xRd, and Clk2xPhi. 
3. This parameter is guaranteed by design. 
4. With PilOn asserted, Reset must be asserted for the longer of 3000 clock cycles or 200 microseconds. 
5. Teyc is one CPU clock cycle (two cycles of a 2x clock). 
6. 


MILITARY TEMPERATURE RANGE (Ta =-55°C to +125°C, Voc = +5.0 V + 10%) 
Sauce Yd 
Data Enable(® ee 
Tos 
Data Hold 
Exception Set-up 
Reset Timing, Phase-lock on(4. 5) 
All timings are referenced to 1.5V. 
No two signals on a given device will derate for a given load by a difference greater than 15%. 
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Tcklow Tckp 
Clk2xSys 
Tsmp 
Clk2xSmp is 
Trd 
Clk2xRd a 
Tsys 
Clk2xPhi 


Figure 6. Input “2x” Clock Timing 


FpSysOut 
FpSmpOutt 
FpRdoutt 


FpPhiOut t 





TThese signals are not actually output from the floating point accelerator. They are drawn to provide 
a reference for other timing diagrams. 


Figure 7. Processor Reference Clock 
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FPA Load 





FPA Store 


Phase 


FpSysOut 


FpPhiOut 


Data Bus 





Figure 8. Floating Point Load/Store Timing 








MoveTo Writeback 





MoveTo MEM Access 


Phase 


FpSysOut 


[oaieeteeees Tsys 
FpPhiOut 


FpCond 





Tfpmov 


Figure 9. Move to FPC Status Timing 





. 
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FP ALU FP Mem 


Phase 1 2 1 


FpSysOut 





Figure 10. Floating Point Interrupt Timing 





FPCompareMEM 





FpCompareALU 





Phase 


FpSysOut 


Tsys 


FpPhiOut 


FpCond 
fpcond 





Figure 11. Floating Point Condition Timing 
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Phase 





heh oe Tsys Tsys 
FpSysOut : 


FpPhiOut 


FpBusy 


Exception 





Figure 12. Floating Point Busy, Exception Timing 


Phase 




















Figure 13. Power-On Reset Timing 
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ORDERING INFORMATION 


IDT —XXXXX. 0 XX. XX 
Device Type Speed Package Process/ 


Temp. 
Range 
Blank Commercial (0°C to +70°C) 
B Military (-55°C to +125°C) 
Compliant to MIL-STD-883, Class B 
M Military Temperature Range Only 
F 84—Pin Quad Flatpack 
G 84-Pin PGA 
QJ 84-Pin J~Bend Cerpack 
16 16.67 MHz 
20 20.0 MHz 
25 25.0 MHz 
33 33.33 MHz 
79R3010 Floating Point Accelerator 


99 





RISC FLOATING-POINT IDT79R3010A 
ACCELERATOR (FPA) IDT79R3010AE 


Integrated Device Technology, Inc. 


FEATURES: 


Hardware Support of Single- and Double-Precision 
Operations: 

- Floating-Point Add 

Floating-Point Subtract 

Floating-Point Multiply 

— Floating-Point Divide 

Floating-Point Comparisons 

— Floating-Point Conversions 

Sustained performance: 

— 11 MFLOPS single precision LINPACK 

— 7.3 MFLOPS double precision LINPACK 

16.7MHz through 40MHz operation 

Direct, high-speed interface with IDT79R3000 Processor 
Supports Full Conformance With IEEE 754-1985 
Floating-Point Specification 

Full 64-bit operation using sixteen 64-bit data registers 
High-speed CEMOS™ technology 

Military product compliant to MIL-STD-883, Class B 
32-bit status/control register providing access to all IEEE- 
Standard exception handling 





e Load/store architecture allows data movement directly 
between FPA and memory or between CPU and FPA 

e Overlapped operation of independent floating point ALUs 

e Fully pin-compatible with IDT79R3010/ADT79R3010L 


DESCRIPTION: 


The IDT79R3010A Floating-Point Accelerator (FPA) operates in 
conjunction with the IDT79R3000A Processor and extends the 
IDT79R3000A's instruction set to perform arithmetic operations on 
values in floating-point representations. The IDT79R3010A FPA, 
with associated system software, fully conforms to the require- 
ments of ANSI/IEEE Standard 754—1985, “IEEE Standard for Bi- 
nary Floating-Point Arithmetic.” In addition, the architecture fully 
supports the standard’s recommendations. 

This data sheet provides an overview of the features and archi- 
tecture of the 79R3010A FPA. A more detailed description of the 
operation of the device is incorporated in the “R3000A Family 
Hardware User's Manual”, and a more detailed architectural over- 
view is provided in the “mips RISC Architecture” book, both avail- 
able from IDT. 











CACHE 
DATA 






INSTRUCTIONS OPERANDS 
Reset > REGISTER UNIT (16 X 64) 
EXPONENT PART FRACTION 











FpBusy <@— B RESULT B RESULT 
CONTROL EXPONENT UNIT ; 
Exception —B> se: : ROUND 
CLOCKS (53) i.) (56) 






DATA BUS 


Thi oo 






RESULT 
MULTIPLY UNIT 












pay = 


Figure 1. IDT79R3010A Functional Block Diagram 


CEMOS is a trademark of Integrated Device Technology, Inc. 





MILITARY AND COMMERCIAL TEMPERATURE RANGES 


SEPTEMBER 1990 





© 


1990 Integrated Device Technology, Inc. 


400 DSC-9039/- 


IDT79R3010ADT79R3010AE 
RISC FLOATING POINT ACCELERATOR (FPA) 


IDT79R3010A FPA REGISTERS 


The IDT79R3010A FPA provides 32 general purpose 32-bit reg- 
isters, a Control/Status register, and a Revision Identification 


General Purpose Registers 
(FGR/FPR) 


6 


3 32 31 0 
FGR1 FGRO 
FGR3 FGR2 
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register. The tightly-coupled coprocessor interface causes the reg- 
ister resources of the FPA to appear to the systems programmers 
as an extension of the CPU internal registers. The FPA registers 
are shown in Figure 2. 


Control/Status Register 


31 0 
Exceptions/Enables/Modes 


Implementation/Revision 
31 Register 0 


a 





Figure 2. IDT79R3010A FPA Registers 


Floating-point coprocessor operations reference three types of 
registers: 


e Floating-Point Control Registers (FCR) 
e Floating-Point General Registers (FGR) 
e Floating-Point Registers (FPR) 


Floating-Point General Registers (FGR) 


There are 32 Floating-Point General Registers (FGR) on the 
FPA. They represent directly-addressable 32-bit registers, and 
can be accessed by Load, Store, or Move Operations. 


Floating-Point Registers (FPR) 


The 32 FGRs described in the preceding paragraph are also 
used to form sixteen 64-bit Floating-Point Registers (FPR). Pairs 
of general registers (FGRs), for example FGRO and FGRI1 (refer to 
Figure 2) are physically combined to form a single 64-bit FPR. The 
FPRs hold a value in either single- or double-precision floating- 
point format. Double-precision format FPRs are formed from two 
adjacent FGRs. 


Floating-Point Control Registers (FCR) 


There are 2 Floating-Point Control Registers (FCR) on the FPA. 
They can be accessed only by Move operations and include the 
following: 

e Control/Status register, used to control and monitor 
exceptions, operating modes, and rounding modes; 


e Revision register, containing revision information about the 
FPA. 


COPROCESSOR OPERATION 


The FPA continually monitors the IDT79R3000A processor in- 
struction stream. If an instruction does not apply to the 
coprocessor, it is ignored; if an instruction does apply to the 
coprocessor, the FPA executes that instruction and transfers nec- 
essary result and exception data synchronously to the 
IDT79R3000A main processor. 


The FPA performs three types of operations: 
e Loads and Stores; 
e Moves; 
e Two- and three-register floating-point operations. 


Load, Store, and Move Operations 

Load, Store, and Move operations move data between memory 
or the IDT79R3000A Processor registers and the IDT79R3010A 
FPA registers. These operations perform no format conversions 
and cause no floating-point exceptions. Load, Store, and Move 
operations reference a single 32-bit word of either the Floating- 
Point General Registers (FGR) or the Floating-Point Control Reg- 
isters (FCR). 


Floating-Point Operations 


The FPA supports the following single- and double-precision for- 
mat floating-point operations: 
e Add 
e Subtract 
e Multiply 
Divide 
Absolute Value 
e Move 
e Negate 
e Compare 
In addition, the FPA supports conversions between single- and 
double-precision floating-point formats and fixed-point formats. 
The FPA incorporates separate Add/Subtract, Multiply, and Di- 
vide units, each capable of independent and concurrent operation. 
Thus, to achieve very high performance, floating point divides can 
be overlapped with floating point multiplies and floating point addi- 
tions. These floating point operations occur independently of the 
actions of the CPU, allowing further overlap of integer and floating 
point operations. Figure 3 illustrates an example of the types of 
overlap permissible. 
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Only Load, Store, and Move operations 
are permitted in FPA during these cycles. 


Other FPA instructions can proceed during 
these cycles. However, two multiply or two 
divide operations cannot be overlapped. 


| These cycles are free for integer opera- 
tions in the CPU. 






(SWC1) 





NON 


FPU 
Figure 3. Examples of Overlapping Floating Point Operation 

Exceptions INSTRUCTION SET OVERVIEW 

The IDT79R3010A FPA supports all five IEEE standard All IDT79R3010 instructions are 32 bits long and they can be di- 
exceptions: vided into the following groups: 
e Invalid Operation e Load/Store and Move instructions move data between 
e Inexact Operation memory, the main processor and the FPA general registers. 
e Division by Zero e Computational instructions perform arithmetic operations on 
¢ Overllow floating point values in the FPA registers. 
« Underflow e Conversion instructions perform conversion operations 

The FPA also supports the optional, Unimplemented Operation between the various data formats. 
exception that allows unimplemented instructions to trap to soft- e Compare instructions perform comparisons of the contents of 
ware emulation routines. registers and set a condition bit based on the results. The 

The FPA provides precise exception capability to the CPU; that result of the compare operation is output on the FpCond 
is, the execution of a floating point operation which generates an output of the FPA, which is typically used as CpCond1 on the 
exception causes that exception to occur at the CPU instruction CPU for use in coprocessor branch operations. 
which caused the operation. This precise exception capability is a Table 1 lists the instruction set of the IDT79R3010A FPA. 


requirement in applications and languages which provide a 
mechanism for local software exception handlers within software 
modules. 


Load/Store/Move Instructions Computational Instructions 
Load Word to FPA ADD.fmt Floating—point Add 

Store Word from FPA SUB.fmt Floating—point Subtract 

Move Word to FPA MUL.fmt Floating—point Multiply 

Move Word from FPA DIV.fmt Floating—point Divide 

Move Control word to FPA ABS.fmt Floating-point Absolute value 


Move Control word from FPA MOV. fmt Floating-point Move 
NEG.fmt Floating-point Negate 


Conversion Instructions Compare Instructions 
CVT.S.fmt Floating—point Convert to Single FP C.cond.fmt | Floating-point Compare 
CVT.D.fmt Floating—point Convert to Double FP 
CVT.W.fmt Floating-point Convert to fixed-point 





Table 1. IDT79R3010A Instruction Summary 





102 


IDT79R3010A/1DT79R3010AE 
RISC FLOATING POINT ACCELERATOR (FPA) 


IDT79R3010 PIPELINE ARCHITECTURE 


The IDT79R3010A FPA provides an instruction pipeline that par- 
allels that of the IDT79R3000A processor. The FPA, however, has 
a 6-stage pipeline instead of the 5-stage pipeline of the 
IDT79R3000: the additional FPA pipe stage is used to provide effi- 
cient coordination of exception responses between the FPA and 
main processor. 

The execution of a single IDT79R3010A instruction consists of 
six primary steps: 

1) IF—Instruction Fetch. The main processor calculates the 
instruction address required to read an instruction from the 
I-Cache. No action is required of the FPA during this pipe 
stage since the main processor is responsible for address 
generation. 

2) RD—The instruction is present on the data bus during phase 
1 of this pipe stage and the FPA decodes the data on the bus 
to determine if it is an instruction for the FPA. 
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3) ALU—If the instruction is an FPA instruction, instruction 
execution commences during this pipe stage. 

4) MEM—TIf this is a coprocessor load or store instruction, the 
FPA presents or captures the data during phase 2 of this pipe 
stage. 

5) WB—The FPA uses this pipe stage solely to deal with 
exceptions. 

6) FWB—The FPA uses this stage to write back ALU results to 
its register file. This stage is the equivalent of the WB stage 
in the IDT79R3000A main processor. 

Each of these steps requires approximately one FPA cycle as 
shown in Figure 3 (parts of some operations spill over into another 
cycle while other operations require only 1/2 cycle). 


Instruction Execution 


Na 


one cycle 


| tcache | pF | op | D-Cache | exceptions | FoWB 





Figure 4. IDT79R3010A Instruction Summary 


The IDT79R3010A uses a 6-stage pipeline to achieve an instruc- 
tion execution rate approaching one instruction per FPA cycle. 


Thus, execution of six instructions at a time are overlapped as 
shown in Figure 5. 






a 


Instruction 
Flow 





| iF | pp | ALU| MEM] we | Fwe | 
| iF | Ro | ALU | MEM] we | Fwe | 
|_IF | RD | ALU] MEM] We | Fwe | 










Pir [n> [ALU [MEM] we [FWO) 


Current 
Cycle 





Figure 5. IDT79R3010A Instruction Pipeline 


This pipeline operates efficiently because different FPA re- 
sources (address and data bus accesses, ALU operations, regis- 
ter accesses, and so on) are utilized on a non-interfering basis. 
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PACKAGE THERMAL SPECIFICATIONS 


The IDT79R3010A utilizes special packaging techniques to im- 
prove both the thermal and electrical characteristics of the floating 
paint accelerator. 

In order toimprove the electrical characteristics of the device, the 
package is constructed using multiple signal planes, including indi- 
vidual power planes and ground planes to reduce noise associ- 
ated with high-frequency TTL parts. 

In order to improve the thermal characteristics of the floating 
point accelerator, the device is housed using cavity down packag- 
ing for the flatpack and the PGA (the J-bend CerQuad is cavity up). 
In addition, these packages incorporate a copper—tungsten ther- 
mal slug designed to efficiently transfer heat from the die to the 
case of the package, and thus effectively lower the thermal resis- 
tance of the package. The use of an additional external heat sink 
affixed to the package thermal slug further decreases the effective 
thermal resistance of the package. 

The case temperature may be measured in any environment to 
determine whether the device is within the specified operating 
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range. The case temperature should be measured at the center of 
the top surface opposite the package cavity (the package cavity is 
the side where the package lid is mounted). 

The equivalent allowable ambient temperature, TA, can be cal- 
culated using the thermal resistance from case to ambient (Oca) for 
the given package. The following equation relates ambient and 
case temperature: 

TA = Tc—P*Sca 
where P is the maximum power consumption, calculated by using 
the maximum Icc from the DC Electrica! Characteristics section. 

Typical values for Oca at various airflows are shown in table 2 for 
the various CPU packages. 











@ca (84—PGA) | 22 | 
@ca (84—Flatpack) | 22 | 
@ca (84-CerQuad)] 25 | 


Table 2. Thermal Resistance (@ca) at Varlous Alrflows 
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PIN CONFIGURATION 
(Top View) 


T—] Data (29) 
«o{_] Data (28) 
cof_] Data (27) 
NF] VCCo 


o@{_] GNDO 








_ 


Clk2xRd 
FpSysin 
Data (31) 
vVCC1 
GND1 
DataP (3) 
FpSysOut 
Clk2xSys 
Clk2xSmp 
Clk2xPhi 
Reset 
FpSync 
VCC2 
GND2 
VCC3 
GND3 
PLLOn 
VCC4 
GND4 
VCC5 
GND5 





oo Bown ={_] Data (30) 











Note: 
Reserved pins must not be connected. 





(23) 


[—] Data (22) 





of] Data (26) 
a[_] Data (25) 
wl] Data (24) 
ro{_] DataP (2) 


e —{_] Data 








© 


84-Pin J Bend CERQUAD 


— So 
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[—_] Data (19) 
[—} Data (18) 








GND13 
DataP (1) 
VCC12 
GND12 
FpCond 


Exception 
Run 
Resvd2 
Resvd1 
VCC11 
GND11 
VCC10 
GND10 
FpPresent 
ResvdO 
VCC9 
GND9 
VCC8 
GND8 
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PIN CONFIGURATION 
(Ceramic, Cavity Down)— BOTTOM VIEW 













pete fae rd dred be 













M 
mai 
: 
H Data | DataP Data | Data 
24 2 11 10 
Data | Data 
S 84—Pin Ceramic Pin Grid Array 
F V Vv Data | Data 
9 
E Data | Data Data | DataP 
27 28 7 0 
D Data | Data Data | Data 
29 30 5 6 
B Data ae Vos ee Vec ig Vec ae 
son 31 
A 


Vss Voc |FRSYS| vss | ClK2x | ves vss | P| Data | Vee Vss 
Out Smp sync 0 
1 2 3 4 5 6 7 8 9 10 11 12 


NOTE: 
1. Reserved pins must not be connected. 
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1IDT79R3010A/IDT79R3010AE 
RISC FLOATING POINT ACCELERATOR (FPA) 


PIN CONFIGURATION 
84—-L QUAD FLATPACK (CAVITY DOWN) 
TOP VIEW 


foo] 

> 
TT ]  Clk2xRd 
LCF Sysin 
T_____] Data (31) 





> 


Data (30) (l= 
Data (29) L_ 
Data (28) CL 
Data (27) Cd 
vcc CS 
GNp Ca 
Data (26) (—————"7 
Data (25) 7] 
Data (24) (“7 
DataP (2) Ca 
Data (23) (7) 
Data (22) (1 
Data (21) C7 
Data (20) C______ 
vec Co 
GND Co 
Data (19) (LJ) 
Data (18) (7 
Data (17) (L———~"7J 
_— 
a 


Data (16) 
VCC 


22 


Q@npo C1] 
DataP (1) J 
veo CO 
GND CJ 
FpCond (TJ 
FpBusy [J 
are 

Exception CL 
Run CH 
Resvd2 C_______] 
Resvd1 [________] 





Fpint 


NOTE: 
1. Reserved pins must not be connected. 











HEATSINK 





VCG Es 
GND (LW 
Vcc Cd 
GND LWW 
FpPresent L__ 
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MILITARY AND COMMERCIAL TEMPERATURE RANGES 





CS CData (0) 
CT) SCData (1) 
fC) ~Data (2) 
pF )CData (3) 
Cd] 6CUGND 
Le} «=O 
jp 6CData (4) 
TC SCData (5) 
[1] _ Data (6) 
C1) :«COData (7) 
fT _)})«(DataP (0) 
[Data (8) 
[f —____] Data (9) 
Ld CData (10) 
{LU h6UGND 
Ld) CCC 
CT) SCData (12) 
TT dCé#éD atta (13) 
Lt! =6(Data (14) 
LT} SCC Datta (1 5) 














IDT79R3010A/IDT79R3010AE 
RISC FLOATING POINT ACCELERATOR (FPA) MILITARY AND COMMERCIAL TEMPERATURE RANGES 





PIN DESCRIPTIONS 













|_| | Input to the FPA which indicates whether the processor-coprocessor system is in the run or stall state. | 
Exception | 1 | Input to the FPA which indicates exception related status information, 
| © | Signal to the CPU indicating a request for acoprocessor busy stall 
| © | Signal to the CPU indicating the result of the last comparison.operation. 


| Fpint = || | Signal to the CPU indicating that a floating-point exception has occurred for the current FPA instruction. 


Synchronous initialization input used to distinguish the processor-FPA synchronization period from the 
execution period. Reset must be synchronized by the leading edge of SysOut from the CPU. 


Input which during the reset period determines whether the phase lock mechanism is enabled and during the 
execution period determines the output timing model. 


Output which is pulled to ground through an impedance of approximately 0.5k ohms. By providing an external 
pullup on this line, an indication of the presence or absence of the FPA can be obtained. 


1A double trequency cock input usedforgeneraing BSG SSCSCSC* 
[1 [A double trequancy clock input used to determine the sample point or data coming into the FPA 
[1 [CA double frequency clock input used fo determine the disable point forthe data drivers, 
ck2xPhi | 1 | A double frequency clock input used o determine the poston ofthe internal phases, phase Yand phase 2. | 
To | | 
el 
ie] 








Synchronization clock from the FPA. 
Input used to receive the synchronization clock from the FPA. 
Input used to receive the synchronization clock from the CPU. 
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IDT79R3010A/IDT79R3010AE 
RISC FLOATING POINT ACCELERATOR (FPA) 


ABSOLUTE MAXIMUM RATINGS” ®) 


SYMBOL RATING COMMERCIAL | MILITARY jUNIT 
Terminal Voltage 
Vterm [with Respectto | -0.5to+7.0 | -0.5to+7.0 | V 
GND 
Operating —-55 to +125 | , 
°C 













Temperature 
Tste Storage 
Tamperature(2) | ~89t0+125 | -65 to +150 


Input Voltage -0.5 to +7.0 -0.5 to +7.0 


NOTES: 


1. Stresses greater than those listed under ABSOLUTE MAXIMUM RAT- 
INGS may cause permanent damage to the device. This is a stress rating 
only and functional operation of the device at these or any other conditions 
above those indicated in the operational sections of this specification is not 
implied. Exposure to absolute maximum rating conditions for extended pe- 
riods may affect reliability. 


2. VIN minimum = —3.OV for pulse width less than 15ns. 
Vin should not exceed Vcc +0.5 Volts. 


3. Not more than one output should be shorted at a time. Duration of the short 
should not exceed 30 seconds. 


4. 16-33MHz Only, 0 to +70 (Ambient) 
5. 37—40MHz Only, 0 to +70 (Case) 


AC TEST CONDITIONS 


[SYMBOL] PARAMETER [MIN | MAX [UNIT 

ver | pitti votage | 30 | — | v1 

[ve | input Low votage | — | oa | 

vas | tnput Low vorage [| — 
Po) 











Input HIGH Voltage 


0.4 
em 
max 
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MILITARY AND COMMERCIAL TEMPERATURE RANGES 


RECOMMENDED OPERATING 
TEMPERATURE AND SUPPLY VOLTAGE 


a -55°C to +125°C 
Commercial] 0°C to +70°C ° 
Commercial] 0°C to +70°C 9, 


Case) 
OUTPUT LOADING FOR AC TESTING 










To Device 
Under Test 











IDT79R3010A/DT79R3010AE 
RISC FLOATING POINT ACCELERATOR (FPA) MILITARY AND COMMERCIAL TEMPERATURE RANGES 


DC ELECTRICAL CHARACTERISTICS FOR IDT79R3010A — 
COMMERCIAL TEMPERATURE RANGE (Ta = 0°C to +70°C, Voc = +5.0 V + 5%) 


Vou | Output HIGH Voltage | Vcc =Min,lon=-4mA | 35 = 
Vou 
Vorrp | Output LOW Voltage’) | Vec=Minla=1smaA | — sos 
VIH 2G = || 
Vit ee ee 








input HGH Votage®™ 
[input ow Votaget 
[Vis | Input HIGH Vorage@™ | Sid SiO COC Cd COC Cd 
vas | input tow votaget@™ | ———SsS—~dT Sri CT 
Fives. 1 Wpulbion vets if a | 
Vac | put ow votaget' | ——*+|[_— | — oa | v1 
cu | nut Gapacnans™ ——p———~+dt | | 
| Cour | OutputCapacitance” [CE CCC YE isd 
| icc | Operating Current | Voc = 5.0V,Ta=7orc | — ss 5a— | Css 
Ti. | nput LOW Leakage [va =GND_‘[=10 0+ ~—t0——~s =i 












DC ELECTRICAL CHARACTERISTICS FOR IDT79R3010AE — 
COMMERCIAL TEMPERATURE RANGE (Ta = 0°C to +70°C, Vcc = +5.0 V+ 5%) 


3.5 = 
—~ 0.4 = 0.4 
| Voure | Output LOW Voltage!) | Vec=Minlo=1.5mA |  — os | os 
Pv | tmputhicH vorage | | oC SC«dT SSC Cid VY 
FV _| Input tow votaget? | ———SS—Sid| = S SCS dT CSCC VY 
vis | InputHiGHvotage@) | ———S~dC—sio CSC SCidY~CiCSC<idrC(C' 
| Vus | Inputow voltage? | | CO te 
| Vic | InputHiGH Voltage) | | Tt 
vnc | Input Low vottagst"® | —SSSCSC~dCSC Sr 
| Cw | Input Capacitance | | Ot | CO to 
| Cour | OutputCapacitance | | tf to 
| loc | Operating Current | Voc =5.0V,Ta=7ore | = — 650 | 700m 
in | Input HIGH Leakage! [Vin =Vec «| ~*~ SSC =*diCn | 
Tin | Input Low Leakage™ [Vu =GND | -10—40~—~«Y—S to SS~Ci=«dsA 
NOTES: 

1, Vit Min. = -3.0V for pulse width less than 15ns. Vit should not fall below —0.5V for longer periods. 

. Vids and Vics apply to Clk2xSys, Clk2xSmp, Clk2xRd, Clk2xPhi, FpSysIn, FpSyne and Reset. 

. These parameters do not apply to the clock inputs. 

. ViHc and Vitc apply to Run, PilOn and Exception. 

. VOLFP applies to the FPPresent pin only. 


. Vik and Vids should not be held above Vcc + 0.5 Volts. 
. Guaranteed by design. 




















NO oO BRB ® PD 
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IDT79R3010A/IDT79R3010AE 
RISC FLOATING POINT ACCELERATOR (FPA) MILITARY AND COMMERCIAL TEMPERATURE RANGES 


DC ELECTRICAL CHARACTERISTICS FOR IDT79R3010AE — 
COMMERCIAL TEMPERATURE RANGE (Tc = 0°C to +70°C, Voc = +5.0 V + 5%) 


SYMBOL PARAMETER TEST CONDITIONS ee ag UNIT 


ig = eter edie Snag 
vee | omipae vo Walige: 1 Vee Sapien 

ie oO ee ae 
CT 
CVn | tour towvotge) | ——S—~dS | 
Wig | lpotHigt Votage [| as | ae Tv 
vas | input Low votaget™ —[————S~d Sr dT rd 
ner Piepolaiaevenegee te a ea 
T vec | input Low votagett [———*dt sr | rd 
cm —[ nput capsetanes™ [Si 0 | 10 or | 
eer Cul Osman [Co ea ge 
Voc_=5.0V, TA = 70°C 
Tin | lnput HIGH Leskage®™™ [Vin =Voo | 10 «dO 
ie | put OW Leskage™ [va =GND +t 00 «df toi —*d a 
Tez | Output Tirsiate Leakage | Von=2v, Var -oav | 0 «0 | «40 | sa 


NOTES: 

1. Vit Min. =-3.0V for pulse width less than 15ns. Vi_ should not fall below —0.5V for longer periods. 
2. Vins and Vis apply to Clk2xSys, Clk2xSmp, Clk2xRd, Clk2xPhi, FpSysin, FpSync and Reset. 

3. These parameters do not apply to the clock inputs. 

4. Vine and Vitc apply to Run, PilOn and Exception. 

5. Votrp applies to the FPPresent pin only. 


6. Vi and ViHs should not be held above Vcc + 0.5 Volts. 
7. Guaranteed by design. 
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IDT79R3010A/1IDT79R3010AE 
RISC FLOATING POINT ACCELERATOR (FPA) MILITARY AND COMMERCIAL TEMPERATURE RANGES 


DC ELECTRICAL CHARACTERISTICS FOR IDT79R3010A — 
MILITARY TEMPERATURE RANGE (Tc =-55°C to +125°C, Vec = +5.0 V+ 10%) 


SYMBOL PARAMETER TEST CONDITIONS ave of wee tne UNIT 


| Vou | OutputHIGH Voltage | Voc = Min, lon =—4mA_| 

Sa Te Sr a OR 
vous | Output Low Votags®™ | Veo=Min-lor= sma | — os | — os | v1 
Tv | Input HIGH Votage™ | | oC Cd] Cd CY 
ve | teputtowverage™ | ——SSC~i SS dT 
| Vis | InputHIGH Voltage?) | 8 
vg | pit LOW votaget™™ | —-+[_—.a_|_— |v] 
| Vic | InputHiGH Voltage) | CT CTC 
Tvus | Input Low vataget [=i Sr | rd 
Tom | Input Gapactence™ [sd St 3 
Output Capacitance” [CE CCCs (GP 
| oc | Operating Current | Voc =5.0V,TA= 70°C | — ss 575 | CSO smA 
iis | np HIGH Leakage®™ [Vn =Veo [10 —10~—~«Y ~~ —~*d | 
ic | tnputlow Leakage) [Va =GND—-[ _-10 10 | 10 10 | #a 
| toz__| Output Tristate Leakage | Vou=24V,Vo=05V] -4o  ~=40 =| -40 = 40 | A 



















DC ELECTRICAL CHARACTERISTICS FOR IDT79R3010AE — 
MILITARY TEMPERATURE RANGE (Tc =-55°C to +125°C, Vcc = +5.0 V+ 10%) 


SYMBOL PARAMETER TEST CONDITIONS er ee UNIT 


| Vow | OutputHIGH Voltage | Vcc =Min,lon=—4mA_| 35 OCC 
eg TL aa Tee Ye TR 
vous | Output LOW Votagst® | Voc =Min.la=15ma_[_— os | — os _| v | 
Pv | iputticHvotage® | | SC «dT ~CCidC 
vi | put tow votes | ———SCSC~id SSS dT Ct Cd 
vis [input HIGH Votage@™) [| a0 — +d oS id VT 
Tvus | inpur Low votage™ | Sid SSS dT Strid 
vine | put HIGH Votageteo [SSS a | rr 
vue | inputtow votaget | Sid Str Cr | 
Ton | input Capacitance | —————SSSid St | td 
cope | OupulcameiedO = fe gto 
in [put HIGH Ceakage®) [Vu =Veo =| foie «d| 0 SCitO*dt 
iu | tnput ow Leakage) [Vn =GNO_—_—~| 00 ~~ 0S =*d 
[oz | oupu Traine Lestage [Vora 2avivo. 08 | = [oot] 


NOTES: 
1. Vit Min. = -3.0V for pulse width less than 15ns. Vit should not fall below —0.5V for longer periods. 


2. Vins and Vis apply to Clk2xSys, Clk2xSmp, Clk2xRd, Clk2xPhi, FpSysin, FpSync and Reset. 
3. These parameters do not apply to the clock inputs. 


4. Vide and Vitc apply to Run, PilOn and Exception. 
5. VoLFP applies to the FPPresent pin only. 


6. Vid and Vins should not be held above Vcc + 0.5 Volts. 
7. Guaranteed by design. 
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IDT79R3010A/IDT79R3010AE 
RISC FLOATING POINT ACCELERATOR (FPA) MILITARY AND COMMERCIAL TEMPERATURE RANGES 


AC ELECTRICAL CHARACTERISTICS FOR IDT79R3010A — "> 9) 
MMERCIAL TEMPERATURE RANGE (Ta = 0°C to +70°C, Voc = +5.0 V+ 5%) 


20.0 MH 
SYMBOL PARAMETER TEST CONDITION] yyy lOO? MHZ | ay OM ay. | UNIT 


TCkHigh Input Clock High!) Note 7 12 —_ 10 — 
TckLow Input Clock Low(2) Note 7 


Input Clock Period 
Clk2xSys to Clk2xSmp') 





TcxP 





Clk2xSmp to Cik2xRd(§) 
Clk2xSmp to Clk2xPhi(5) 


Timing Parameters 


TDEn Data Enable(3) 
Tobis Data Disable(3) 


Tova! Data Valid 


Tos 
Tou 
Tepcond 
Trpbusy 
Teptov 
Trexs 
Tsexs 
Text 
Truns 
Trunk 
Tsals 
Tsai | stallHold 
Reset Initialization 

Trsut 
Tes 


Capacitive Load Deration 


O 
Bi i 


















load=25F | — 3 | —~ 3 


| 
a 
oO 
| 
wo 
nn 








NOTES: 

All timings are referenced to 1.5V. 

The clock parameters apply to all four 2xClocks: Clk2xSys, Clk2xSmp, Clk2xRd, and Clk2xPhi. 

This parameter is guaranteed by design. 

With PilOn asserted, Reset must be asserted for the longer of 3000 clock cycles or 200 microseconds. 
Teyc is one CPU clock cycle (two cycles of a 2x clock). 

No two signals on a given device will derate for a given load by a difference greater than 15%. 

Clock transition time < 5ns. 


NO Pon = 
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IDT79R3010A1DT79R3010AE 
RISC FLOATING POINT ACCELERATOR (FPA) MILITARY AND COMMERCIAL TEMPERATURE RANGES 
SE EER I I DEAS TI SPE ISI GIL FD TOT TINS ES IT IT ST a TE TAS I ER OTT 


AC ELECTRICAL CHARACTERISTICS FOR IDT79R3010AE — (1:9) 
MMERCIAL TEMPERATURE RANGE (Ta = 0°C to +70°C, Voc = +5.0 V + 5%) 


SYMBOL PARAMETER 


TeCkHigh 
TekLow 


Input Clock Period 

Tip Clk2xSys to Clk2xSmp/) 
Clk2xSmp to Clk2xRd(°) 

Clk2xSmp to Clk2xPhi() 


Timing Parameters 


Toes 
Tool 
Tovs 
Tasos 


QO 
i LE 


Tps 

Tou 

TFpCond 

TepBusy 

Tepin 

TepMov 

Trexs 

Tsexs 

Tex 

Truns 

Taunt 

Torats 

stati 
Reset Timing, Phasestock on) | | 3000, | 8000S | Toye 
| Tr | Reset Timing Phaserock off) | | 28 | ta Toye 


Capacitive Load Deration 


NOTES: 

7. All timings are referenced to 1.5V. 

8. The clock parameters apply to all four 2xClocks: Clk2xSys, Clk2xSmp, Clk2xRd, and Clk2xPhi. 

9. This parameter is guaranteed by design. 

10. With PiiOn asserted, Reset must be asserted for the longer of 3000 clock cycles or 200 microseconds. 
11. Teyc is one CPU clock cycle (two cycles of a 2x clock). 

12. No two signals on a given device will derate for a given load by a difference greater than 15%. 

13. Clock transition time < 2.5ns for 33MHz; clock transition time < 5ns for all other speeds. 
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IDT79R3010A/DT79R3010AE 
RISC FLOATING POINT ACCELERATOR (FPA) MILITARY AND COMMERCIAL TEMPERATURE RANGES 


AC ELECTRICAL CHARACTERISTICS FOR IDT79R3010AE — ‘1+ 3) 
MMERCIAL TEMPERATURE RANGE (Tc = 0°C to +70°C, Vcc = +5.0 V + 5%) 


SYMBOL PARAMETER 


TekHigh Input Clock High!) 
TckLow Input Clock Low'@) 


Input Clock Period 
Clk2xSys to Clk2xSmp\) 
Clk2xSmp to Clk2xRd(5) 
Clk2xSmp to Clk2xPhi(®) 


Timing Parameters 


TpeEn Data Enable() feo SH = 
Tpval Data Valid Load = 25pF 





Tckp 








oO 
er iE 








TRsDs Reset Set-up 


Data Set-up 


TpH Data Hold(3) F 


aoe 

ae ae — 
Trpcond_| FpCondiion | | 
Trev [FoMoveTo Sid SSSSCSC~dSCS SC’ 

Tacs | Exception Set-up (Run Cycle) | 

Tsexs__| Exception Set-up (Stall Cycle) | 

Test | ExceptionHold =| | 

Tans | Runset-up | 

Taunt | RunHold | 

Tsats | StallSetup | | 
Tom | stalHold | pee 
Reset Initialization 

| Tipu | Reset Timing, Phasertock on) | | Teye 
| Tr | Reset Timing, Phasestock oft) [| tae Tt Toe 
Capacitive Load Deration 


NOTES: 

All timings are referenced to 1.5V. 

The clock parameters apply to all four 2xClocks: Clk2xSys, Clk2xSmp, Clk2xRd, and Clk2xPhi. 

This parameter is guaranteed by design. 

With PilOn asserted, Reset must be asserted for the longer of 3000 clock cycles or 200 microseconds. 
Teyc is one CPU clock cycle (two cycles of a 2x clock). 

No two signals on a given device will derate for a given load by a difference greater than 15%. 

Clock transition time < 2.5ns for 33MHz. Clock transition <5 for 25 MHz. 


| 
_ 
“NJ 





° 
2° 
° 
| 
@ 
° 
So 
° 
| 









BO Ol SON 
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IDT79R3010A/1DT79R3010AE 
RISC FLOATING POINT ACCELERATOR (FPA) MILITARY AND COMMERCIAL TEMPERATURE RANGES 


AC ELECTRICAL CHARACTERISTICS FOR IDT79R3010A — ‘1:3) 
TARY TEMPERATURE RANGE (Tc = -85°C to +125°C, Voc = +5.0 V + 10%) 


20.0 MH 
SYMBOL PARAMETER TEST CONDITION] gyno” MHZ aie Ma - |b oUnt 


Input Clock High® | Noto? [| 2 Tt ts 
Input Clock Low) | Note? {| 2 | ts 


10 
Input Clock Period 25 
Clk2xSys to Clk2xSmp) 







= 
LE 










Tckp 


Clk2xSmp to Clk2xRd(5) 
Clk2xSmp to Clk2xPhil®) 











Timing Parameters 


Geel eset ee ee | 
Tee | FoMowTs | 
The 
Tsess | Exception Sot-up (Stall Gyo) | ___— 
ee nt, ce | ee ee = 
CO ea EA 
Toa [stata | 
Feapecive load eration SSCSCSC—“‘“SCSCSC—“SsSCSsSSSC*”Y 


Capacitive Load Deration 


Load Deratet® ae TE ee ee 


NOTES: 

All timings are referenced to 1.5V. 

The clock parameters apply to all four 2xClocks: Clk2xSys, Clk2xSmp, Cik2xRd, and Clk2xPhi. 

This parameter is guaranteed by design. 

With PllOn asserted, Reset must be asserted for the longer of 3000 clock cycles or 200 microseconds. 
Teyc is one CPU clock cycle (two cycles of a 2x clock). 

No two signals on a given device will derate for a given load by a difference greater than 15%. 

Clock transition time < 5ns. 







—s 
hh 
| 
—_ 
[ye] 
| 


ns/25pF 


OL ON 
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IDT79R3010A/DT79R3010AE 
RISC FLOATING POINT ACCELERATOR (FPA) MILITARY AND COMMERCIAL TEMPERATURE RANGES 


AC ELECTRICAL CHARACTERISTICS FOR IDT79R3010AE — 9) 
TARY TEMPERATURE RANGE (tc =~55°C to +125°C, Voc = +5.0 V + 10%) 


SYMBOL PARAMETER 









= 
LE 


Clock 


TckHigh Input Clock High(?) 
TckLow Input Clock Low() 


Input Clock Period 
Clk2xSys to Clk2xSmp(5) 
Clk2xSmp to Cik2xRd(5) 
Clk2xSmp to Clk2xPhi(5) 


Timing Parameters 


Toon | DstaDisbie® a eee 








Top 













a 
n 


| 
np 
on 


iaem lieptcomaren “fe 
Tioher [FpMoveto | SSi*d es 
—— 
ee 
es 
aa 
Eee 
aes 
rT 


3 
n 


p 
TRexs 
Tsexs 
Text 
Truns 
Taunt 
Tstais 
Tstatl 
Troput__| Reset Timing, Phase-lockon) |__| 3000S — _—s | 000 — Tae | 
Reset Timing, Phase-lock off) | | _128 = 


Capacitive Load Deration 


NOTES: 

All timings are referenced to 1.5V. 

The clock parameters apply to all four 2xClocks: Clk2xSys, Clk2xSmp, Clk2xRd, and Clk2xPhi. 

This parameter is guaranteed by design. 

With PilGn asserted, Reset must be asserted for the longer of 3000 clock cycles or 200 microseconds. 
Teyc is one CPU clock cycle (two cycles of a 2x clock). 

No two signals on a given device will derate for a given load by a difference greater than 15%. 

Clock transition time < 2.5ns for 33MHz; clock transition time < 5ns for all other speeds. 









SP ON ON 
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IDT79R3010A/IDT79R3010AE 
RISC FLOATING POINT ACCELERATOR (FPA) MILITARY AND COMMERCIAL TEMPERATURE RANGES 


Tcklow Tckp 


Clk2xSys ——— ~ , 
Tsmp 


Clk2xSmp 
Trd 
Clk2xRd ie 


Tsys 
Clk2xPhi 


Figure 6. Input “2x” Clock Timing 


FpSysOut 
FpSmpoutt 
FpRdoutt 


FpPhioutT 





TThese signals are not actually output from the floating point accelerator. They are drawn to provide 
a reference for other timing diagrams. 


Figure 7. Processor Reference Clock 
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FPA Store FPA Load 





Phase 


FpSysOut 


FpPhiOut 


Data Bus 


TstallS 





Figure 8. Floating Point Load/Store Timing 








MoveTo MEM Access MoveTo Writeback 





Phase 


FpSysOut 


K_Fecond_) 


FpCond 
| ” 


FpInt 


Tfpmov Tipmov 


Figure 9. Move to FPC Status Timing 
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IDT79R3010A/1DT79R3010AE 
RISC FLOATING POINT ACCELERATOR (FPA) MILITARY AND COMMERCIAL TEMPERATURE RANGES 


Phase 


FpSysOut 





Figure 10. Floating Point Interrupt Timing 


FpCompareALU FPCompareMEM 


Phase 





FpSysOut 


=— Tsys 
FpPhiOut 


FpCond 





Figure 11. Floating Point Condition Timing 
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IDT79R3010A1DT79R3010AE 
RISC FLOATING POINT ACCELERATOR (FPA) MILITARY AND COMMERCIAL TEMPERATURE RANGES 


Phase 


FpSysOut 
FpPhiOut 


FpBusy 


Exception 


: ; Tstails K—D 
a TreExs TExH pms 


Run gD} TStallH 





Figure 12. Floating Point Busy, Exception Timing 


Phase 1 




















Reset 


Vcc 





Figure 13. Power-On Reset Timing 
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IDT79R3010A/1DT79R3010 
RISC FLOATING POINT ACCELERATOR (FPA) MILITARY AND COMMERCIAL TEMPERATURE RANGES 


ORDERING INFORMATION 


IDT —XXXXX_ SARS go A 
Device Type Speed Package Process/ 
Temp. 
Range 


Blank Commercial (0°C to +70°C) 

B Military (-55°C to +125°C) 
Compliant to MIL-STD-883, Class B 

M Military Temperature Range Only 

F 84~-Pin Quad Flatpack (Cavity Down) 

G 84—Pin PGA (Cavity Down) 

QJ 84—Pin J—-Bend Cerpack (Cavity Up) 

16 16.67 MHz 

20 20.0 MHz 

25 25.0 MHz 

33 33.33 MHz 

37 37 MHz 

40 40 MHz 


79R3010A Floating Point Accelerator 
79R3010AE Enhanced Timing Floating Point Accelerator 


parE 
DS-—79R3010A-090 
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FEATURES: 


e Temporary storage buffers to enhance the performance of the 
IDT79R3000 RISC CPU processor 

e Allows for write operations by the RISC CPU processor during 
Run cycles 

e Each Write Buffer has four locations to handle an 8-bit 

address slice and a 9-bit data slice (including a parity bit) 

High-speed CEMOS™ technology 

Pin, functionally and software compatible with the MIPS 

Computer Systems R2020 Write Buffer 

Speeds from 16.7 to 33.33 MHz 

Military product compliant to MIL-STD-883, Class B 


DESCRIPTION: 


The IDT79R3020 Write Buffer enhances the performance of 
IDT7S9R3000 systems by allowing the processor to perform write 
operations during Run cycles instead of resorting to time- 
consuming stall cycles. Each IDT79R3020 device handles an 8-bit 
slice of address, and a 9-bit slice of data (one parity bit per byte); 
thus, four IDT79R3020s provide 4-deep buffering of 32 bits of 
address and 36 bits of data and parity. Figure 1 illustrates the 
functional position of the Write Buffer in an IDT79R3000 system. 

Whenever the processor performs a write operation, the Write 
Buffer captures the output data and its address (including the 
access type bits). The Write Buffer can hold up to four data— 
address sets while it waits to pass the data on to main memory. 
Transfers from the processor to the write buffers occur synchro- 
nously at the cycle rate of the processor and the write buffer signals 
the processor if it is unable to accept data. The write buffer also 
provides a set of handshake signals to communicate with a main 
memory controller and coordinate the transfer of write data to main 
memory. 

The sections that follow describe these IDT79R3020 Write 
Buffer interfaces: 

e the processor-Write Buffer interface 
e the Write Buffer-main memory interface 
e amiscellaneous, Write Buffer-board control interface. 
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Figure 1. The IDT79R3020 Write Buffer in an IDT79R3000 System 
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WRITE BUFFER - IDT79R3000 PROCESSOR 
INTERFACE 


Figure 2 shows the signals comprising the Write Buffer interface 
to the IDT79R3000 (all descriptions assume that four IDT79R3020 


Write Buffers are used to implement a 32-bit, buffered interface). 
The AdrLo bus and Tag bus bits from the processor are both 


Address Bus 
(AdrLo & Tag) 


and Parity 


IDT79R3000 
Processor 
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connected to the Write Buffer to form a 32-bit physical address that 
is captured by the buffers. Thirty-two bits of data, four bits of parity, 
and two access type bits are also captured by the Write Buffer. The 
paragraphs that follow describe the Write Buffer-processor inter- 
face signals and the timing of processor-to-Write Buffer data 
transfers. 


Addrln7:0 


Address1:0 


Dataln8:0 


AccTypO 
AccTyp1 


Request 


Figure 2. Write Buffer — IDT79R3000 Processor Interface 


Write Buffer-Processor Interface Signals 


Clock 

An inverted version of the IDT79R3000’s SysOut signal from the 
IDT79R3000 processor that synchronizes data transfers. The 
Write Buffer uses the trailing edge of Clock to latch the contents of 
the AdrLo bus and uses the leading Clock edge to latch the con- 
tents of the Data and Tag buses. 


Datain8:0 
Nine input data lines from the IDT79R3000 processor's Data bus 
(eight bits of data and one bit of parity). 


AddriIn7:0 
Eight input address lines from the IDT79R3000 processor. The 
address lines are taken from the AdrLo and Tag buses. 


Address1:0 

The two least significant address bits from the IDT79R3000 proc- 
essor. These two address bits must be connected to all four Write 
Buffers and are used in conjunction with the access type 
(AccTyp1:0) signals, the Position1:0 signals, and the BigEndian 
signal to determine which byte(s) in a word are being written into a 
particular Write Buffer. 


AccTypin1:0 
The access type signals from the IDT79R3000 processor speci- 
fying the size of a data access: word, tri-byte, half-word, or byte. 


WiMem 

This input is connected to the MemWr signal from the 
IDT79R3000 processor that is asserted whenever the processor is 
performing a store (write) operation. 


Request 

The primary purpose of this signal is to request access to mem- 
ory and is described later when the Write Buffer-Main Memory 
Interface is discussed. The Request signal can also be connected 
to the CpCond0 input of the IDT79R3000 and can then be tested 
by software to determine if there is any data in the Write Buffer. 


Since Request is deasserted if there is no data in the Write Buffer, 
software can determine if a previous write operation (for example, 
to an I/O device) has been completed before initiating a read or 
read status operation from that device. 


WbFull 

The Write Buffer asserts this signal to the IDT79R3000's WrBusy 
input whenever it cannot accept any more data; that is, when the 
current write will fill the buffer or the buffer has all address-data 
pairs occupied. The IDT79R3000 processor performs a write-busy 
stall if it needs to store data while the WbFull/WrBusy signal is 
asserted. 


Data & Address Connections 


Figure 3 illustrates how four Write Buffers are connected to the 
address and data outputs of the IDT79R3000 processor. 


Address Inputs 

Each Write Buffer device has eight address inputs (Adrin7:0). 
The four low-order bits (AdrIn3:0) are clocked into the device on 
the trailing edge of the Clock signal and are taken from the 
IDT79R3000’s AdrLo bus. The four high—order bits (AdrIn7:4) are 
clocked into the device on the rising edge of the Clock signal and 
are taken from the IDT79R3000's Tag bus. 

Each device also has separate inputs (Address1, AddressO) for 
the two low-order bits from the AdrLo bus. These bits must be 
input to each device since they comprise the byte pointer. Note in 
Figure 3 that the two low-order Adrin inputs (Adrin1:0) to Write 
Buffer device 0 are connected to ground since the Address, 
Address0 inputs already supply these bits to the device. 


Data Inputs 

Each Write Buffer device has nine data inputs that are clocked 
into the device on the leading edge of the Clock signal and are 
taken from the 1DT79R3000’s Data bus. In Figure 3, each device 
captures eight bits of data and one bit of parity. Also note that the 
data bits assigned to each device correspond to the address bits 
connected to the device. This arrangement is required since data 
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selection is dependent on a combination of the AccType signals 
The arrangement also 


and the two low order address bits. 









IDT79R3000 
Processor 
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simplifies system utilization of the “Read Error Address” feature 
described later. 


System Data Bus 
System Address Bus - 


Read 
Buffer 






Adrin7:4 





Adrln3:0 Write 
Buffer 
Datain8:4 3 


DataOut 





Position1 
PositionO 


Dataln3:0 
Address1:0 





AdrIn7:4 





Address Out 


DataOut | 
Address Out 


“Q” 
“0” 


Adrln3:0 





Write 
Buffer 
2 





Dataln8:4 









Position1 
PositionO 


Data!n3:0 
Address1:0 


Adrin7:4 
Adrin3:0 Write 

Buffer 
Dataln8:4 1 DataOut 





Position1 
PositionO 


Dataln3:0 
Address1:0 





Adrln7:4 





Adrln3:2 
Adrin1:0 


Write 
Buffer 
0 





Dataln8:4 Position1 


PositionO 
Datain3:0 


Address1:0 


Figure 3. Write Buffer Data and Address Line Connections 


The Position! and PositionO signals shown in Figure 3 specify 
the nibble position within a halfword that each write buffer device 
comprises. 


Write Buffer - Processor Timing 


Transfers between the processor and the Write Buffers occur 
synchronously: the Clock signal from the processor is input to the 
Write Buffers and used to clock the address and data information 


into the Write Buffers’ latches. Figure 4 illustrates the timing for the 
processor-Write Buffer interface. 

When the WrtMem signal is asserted, the low-order address bits, 
and the Address 1:0 inputs, are latched on the trailing edge of the 
Clock signal (1). The rising edge of Clock (2) is used to latch the 
high-order address bits, the access type inputs and the contents of 
the data bus. 
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Figure 4. Processor — Write Buffer Interface Timing 


WRITE BUFFER - MAIN MEMORY INTERFACE 


Figure 5 shows the signals comprising the Write Buffer interface 
to main memory. This interface is essentially decoupled from the 
Write Buffer-processor interface: although some synchronization 


OutEn 


AddrOut 


AccTypOut0 
AccTypOut1 


DataOut 
and Parity 


Write 
Buffer 
(x 4) 


Request 


Acknowledge 





of the memory interface signals and the Clock signal is required, 
the handshaking signals in this interface have no direct connection 
to the operation of the Write Buffer-processor interface. 


Main 


Memory 
Controller 


Figure 5. Write Buffer — Main Memory Interface 


Write Buffer - Main Memory Interface Signals 

Each Write Buffer provides the following signals that comprise 
the interface to a main memory controller: 
AddrOut 7:0 

Eight address line output from each Write Buffer. 


DataOut 8:0 

Nine data lines from each Write Buffer (eight bits of data and one 
bit of parity). 
AccTypOut 1:0 

The access type signals from the Write Buffer specifying the size 
of a data access: word, tri-byte, half-word, or byte. 


OutEn 
The memory controller asserts this write input to enable the tri- 
state outputs of the IDT79R3020 address and data signals. 


Request 
The Write Buffer asserts this signal to inform the main memory 
system that it has data to be written to memory. 


Acknowledge 
The main memory system asserts this signal when it has cap- 
tured the data presented by the Write Buffer on the DataOut lines. 
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Write Buffer - Main Memory Interface Timing 


Figure 6 illustrates the timing for the transfer of data from the 
Write Buffer to the main memory system. The sequence illustrated 
in this figure is as follows: 


1) When the Write Buffer has a data-address pair for transfer to 
the memory system, it asserts the Request signal. 


2) When memory system is ready to handle the Write Buffer 
data, it asserts the OutEn signal to enable the Write Buffers’ 
address and data outputs onto the system buses. 


3) When memory system no longer requires the Write Buffer 
address and data outputs, it asserts the Acknowledge signal. 


Clock 


Request 


Acknowledge 
OutEn 


AddrQut 
DataOut 
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The Write Buffer responds to this signal by discarding the 
address-data pair that was just output. 

4) The memory system can deassert the OutEn signal to return 
the Write Buffers’ address and data outputs to their tri-state 
condition. 





5) Since the Request signal remains asserted, the memory sys- 
tem asserts the OutEn signal again to enable the next 
address-data pair onto the system buses. 

6) When memory system has accepted the second address- 
data pair, it again asserts the Acknowledge signal. If the Write 
Buffer is now empty, it responds to this signal by deasserting 
the Request signal. 





Figure 6. Write Buffer — Main Memory Interface Timing 


Note that the buffer’s interface to main memory is not completely 
asynchronous: assertion of the Request signal by the Write Buffer 
is synchronized with the rising edge of Clock, and the Acknowl- 
edge signal input by main memory has a minimum set up and hold 
time in relation to the Clock signal. 


MISCELLANEOUS WRITE BUFFER - BOARD 
LOGIC INTERFACE 

The Write Buffers support several functions that utilize signals 
that do not fit neatly into the descriptions of either the processor or 
main memory interfaces. These functions and signals typically 
involve miscellaneous logic on a CPU board and include the fol- 
lowing: 
e byte gathering 
¢ configuration connections (Big Endian, Position 1:0) 
e address matching logic 
e error address latch logic 

The sections that follow describe each of these categories. 


Byte Gathering 


The Write Buffers perform byte (half-word, tri-byte and word) 
gathering to decrease the number of write transfers to same loca- 
tion; that is, sequential writes to the same WORD address have 
their data combined into the same address-data pair buffer. 

Byte gathering is prohibited in the address-data pair that is cur- 
rently available to the memory controller. Thus, the first write into 
an empty Write Buffer will not have subsequent writes gathered 
into it because it is currently available for output to memory. Writes 
to the same location (byte) may be overwritten in the Write Buffer if 
the gathering is not prohibited by the preceding rule. 

The Write Buffers present address-data pairs to the main mem- 
ory controller in the sequence in which they were received from the 
processor except in the case of gathered data, where bytes or half 
words can be collected and written to main memory in a single 
write operation. If the address-data pair buffer is scheduled to be 
output, then gathering is inhibited and the buffer contents are pre- 


sented to the main memory controller. Subsequent writes are then 
placed in another buffer. No reliance should be placed in any 
aspect of gathering (except that it only involves sequential writes to 
the same ward address) as it is not readily deterministic. Non- 
sequential writes to the same word address are not gathered. 

Note that gathering can require that two main memory controller 
references be used to empty a single Write Buffer entry. For exam- 
ple, this can occur if Bytes 0 and 3 of a word are sequentially writ- 
ten. Where order in writing is important, such as in I/O controllers, 
software should avoid sequential accesses to the same word. In 
cases where write-read access orderingis important but reading of 
the write location is not desired, such as during I/O, then a write fol- 
lowed by a write to a dummy location followed by a read of the 
dummy location will insure the first write has occurred before con- 
tinuing. Alternatively, the Request signal can be tested to deter- 
mine that the Write Buffer is empty. 


Configuration Logic Connections 


Because of their byte gathering capability, each buffer device 
internally maintains a record of each valid byte in an address/data 
pair. To do this, each device must have a way of determining which 
data bits within a word it is handling. The following signals deter- 
mine how the write buffers handle data that is written to the 
devices: 
¢ Position 1, Position 0 - these signals (in conjunction with Big 

Endian) determine how each Write Buffer decodes the 
Address 1/0 and AccType 1/0 to determine if it should store 
the data inputs. Refer to Figure 3 for an illustration of how 
data bits are assigned to Write Buffer devices based on their 
position. 

e Big Endian - When asserted, byte 0 is the leftmost, most 
significant byte (big-endian): when deasserted, byte 0 is the 
rightmost, least-significant byte (little-endian). 

e Address 1, Address 0 - these signals (taken from the AdrLo 
bus) must be connected to all buffer devices since they 
determine which byte within a word is being accessed. 








IDT79R3020 RISC CPU WRITE BUFFER 


MILITARY AND COMMERCIAL TEMPERATURE RANGES 





e AccType 1, AccType 0 - these inputs signals specify the data 
size of a write operation as shown in Table 1. 


31 Big-Endian 0 


0 1 
(halfword) 


0 oO 
(byte) 





Table 1 shows how these signals operate to specify how bytes 
are saved within the Write Buffers. 


Bytes Accessed 


31 Little—Endian 


Table 1. Byte Specifications for Write Operations 


The lower two address bits of the device in position zero (as 
determined by the two POSITION inputs) are inhibited; that is, they 
are not stored directly as they are output onthe AdrLo bus. Instead, 
on output, the lower two address bits are generated from the indi- 
cation of the positions of the valid data bytes as determined by 
above table. 


MatchOut/Matchin Logic and Read Conflicts 


Whenever the processor references main memory (either a write 
or a read reference), the Write Buffers compare the word address 
from the CPU with the word addresses stored in the buffers. If any 
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word address matches, the buffers assert signals that can be used 
by the main memory controller to ensure that the Write Buffer is 
emptied before the read access with the conflicting address has 
been performed. 

Figure 7 illustrates the Write Buffer signals involved in address 
comparison logic. Each write buffer provides four output signals 
(MatchOut A, B, C, and D) which correspond to the four buffer 
ranks (A, B, C, D) in each device as shown in Figure 1. These 
MatchOut signals can be externally NANDed as shown in Figure 7 
to determine if the address being input matches those in any rank 
of the Write Buffer. 


To Main Memory 
Controller 


CONFLICT 


Figure 7. Write Buffer MatchOut/Matchin Logle 
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The outputs of the NAND gates are fed into Write Buffers via the 
Matchtin A, B, C, and D signals and are used within each device as 
part of the byte gathering logic. The NAND gate outputs can be 
NANDed together as shown in Figure 7 with the resultant signal 
used (in conjunction with the processor's MEMRD signal) to alert 
the main memory controller logic that there is a pending buffered 
write that conflicts with a just-issued read. The main memory con- 
troller can then delay the read access until the Request signal is 
deasserted indicating that the Write Buffer has been emptied. 
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Error Address Latch 


The write buffer incorporates an internal latch that can be loaded 
with one of the buffered addresses and subsequently enabled out 
onto the data lines. This feature can be used by error handling rou- 
tines to read an address back from the Write Buffer and analyze or 
recover from certain bus errors. Figure 8 shows the signals in- 
volved in operation of this latch. 





l 


AddressOut 


Error Address 
Latch 


















Figure 8. The Write Buffer Error Address Latch 


When the LatchErrAddr signal is asserted, the address currently 
available to the address outputs of the Write Buffer is latched into 
the internal latch. This address can then be output on the DataOut 
lines by asserting the EnErrAdr signal so that the processor can 


read the address in as data. Refer to the AC specifications for 
timing parameters of the signals associated with the error address 
latch. 
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ABSOLUTE MAXIMUM RATINGS" ®) RECOMMENDED OPERATING 


SYMBOL TEMPERATURE AND SUPPLY VOLTAGE 


AMBIENT 
Terminal Voltage Vec 
V with Respect to 7 7 Vv TEMPERATURE 
TERM pat p —0.5 to +7.0 -0.5 to +7.0 55°C to +125°C 5.04 10% 

| ia _0°C to +70" v 
[ta [Poor ee a 
tans fyomperature | s5t04125 | ~6510 +135 

°C 


OUTPUT LOADING FOR AC TESTING 
Tic Storage 
Temperature(2) | ~S9to+125 | —65 to +150 


/ Vin [Input Voltage -0.5to+7.0 | -0.5 to +7.0 


NOTES: 


1. Stresses greater than those listed under ABSOLUTE MAXIMUM RAT- 
INGS may cause permanent damage to the device. This is a stress rating 
only and functional operation of the device at these or any other conditions 
above those indicated in the operational sections of this specification is not 
implied. Exposure to absolute maximum rating conditions for extended pe- 
riods may affect reliability. 


2. VINminimum =—3.0V for pulse widthless than 15ns. Vin should not exceed 
Vcec +0.5 Volts. 


3. Not more than one output should be shorted at a time. Duration of the short 
should not exceed 30 seconds. 


DC ELECTRICAL CHARACTERISTICS — 
COMMERCIAL TEMPERATURE RANGE (Ta = 0°C to +70°C, Voc = +5.0 V + 5%) 


16.67 MHz 20.0 MHz 
SYMBOL PARAMETER TEST CONDITIONS MIN. MAX.| MIN. MAX. Pe 


Output HIGH Voltage Vcc = Min, loH = -4mA 
Output LOW Voltage Vcc = Min, lot = 4mA 


























To Device 
Under Test 









| vow | 

| vin | InputHiGH Voltage | 
| vi | inputtowvottage | 
| cw | InputCapacitance | 
| Output Capacitance [| 


DC ELECTRICAL CHARACTERISTICS — 
MILITARY TEMPERATURE RANGE (Ta =-55°C to +125°C, Vcc = +5.0 V + 10%) 





~—40 40 





10 
40 40 -40 40 


Output Tri-state Leakage 









| Von | OutputHiGH Voltage | Voc =Minion=—amA 
| Vo._| Outputtow Voltage | Vec=Min, lo. dma | 
| vin | inputHiGH Voltage | fe 
| vi | inputtow voltage | 
IN 
cc 
IH 
IL 






































| ita | inputHiGHLeakage «| ~sVineVeo 8 i P= 10 | — 10 | 
|__| Input Low Leakage }-10 = — | -10  — | 
Output Tri-state Leakage 
















VoH = 2.4V, Vol = 0.5V 
NOTES: 

1. Vy, should be held above Veg + 0.5 Volts. 

2. Vi Min. =—3.0V for pulse width less than 15ns. Vi_ should not fall below -0.5 Volts for longer periods. 
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AC ELECTRICAL CHARACTERISTICS — 
COMMERCIAL TEMPERATURE RANGE (7, + 0°C to 70°C, V._ = +5.0V + 5%) 


cee a es 
os as eg 








SYMBOL 


Access Type 1:0 to Clock rising setup 
Access Type 1:0 from Clock rising hold 







Addrin (7:4) to Clock rising setup 
Addrin (7:4) from Clock rising hold 
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EnEirAdr 10 Data enor atch) vald__———~d; Pe 16 | 
En€rrAdr [oe ee] 15 


EnerrAdr to Data (error latch) tri-state |e 45 
Address/Data out from Clock rising 





5 — 
11 — 
— 19 
2 15 
2 15 
2 15 
— 27 

Reset to Clock rising, set-up 

2 = 





Reset from Clock rising, hold 


cycles 








Request High from Reset low 
Access TypOut 1:0 low from Reset low 


Match Out (ABCD) Low from Reset low 
Address/Data out tri-state from Reset low 

(OutEn negated) 

Access TypeOut from Clock rising | — 32 | | — 23 | 


tcyc Clock Pulse Width 2000} 50 2000} 40 2000} 30 2000 
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tckhigh | Clock High Pulse Width | 240 | 
tcklow | Clock Low Pulse Width 
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AC ELECTRICAL CHARACTERISTICS — 
MILITARY TEMPERATURE RANGE (Ta +-55°C to 125°C, Vcc = +5.0V + 10%) 


16.67 MHz 20.0 MHz 
Addrin (3:0) to Clock falling setup 
Addrin (3:0) from Clock falling hold 






SYMBOL 


: aa 
rs | 
[Adds 1010 Cocktalingsovp _——SSC—~ir | 
i@ _| Adress FO fom Clocttainghold_————SSS—id 

[Access Type 10 from Clkrshghold ————SSCSC«d 
[Addin :a)tom Gockrsingho@ ———SCSCSC~—~i 
[Daan (60) tom Gockrshgreld@ —————S—S—S~d 
a2 | Willem trom Gockrsnghold__——SCSC—~S 
sa] Request irom Gockrsng _—_———SSSSSCS~d 
Tita | Acknowiodgo to Cock isngeoup | =i 
a 
[ite | Latehenadr to Acknowlodge ising ————S—~dt fs 


WoFull inactive from Clock rising 

t19 utEn to AddrOut (7:0), DataOut (8:0) valid 
utEn to AddrOut (7:0), DataOut (8:0) tri-state 

MatchOut (ABCD) from Clock rising 

MatchIn (ABCD) to Clock rising setup 

MatchIn (ABCD) from Clock rising hold 

EnErrAdr to Data (error latch) valid 


Addrin (7:4) to Clock rising setup 











t2 


t25 EnErrAdr to Data (error latch) tri-state 
t26 
t27 Reset to Clock rising, set-up 


Address/Data out from Clock rising 








Reset from Clock rising, hold 





Reset low pulse width 
t30 WbFull High from Clock rising (after Reset) 
Request High from Reset low 








Access TypOut 1:0 low from Reset low 
t33 Match Out (ABCD) Low from Reset low 


t34 Address/Data out tri-state from Reset low 
(OutEn negated) 


Access TypeOut from Clock rising 


teyc Clock Pulse Width ae 
tckhigh | Clock High Pulse Width 20 
tcklow Clock Low Pulse Width 
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Figure 9. Write Buffer Timing Specifications 
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Figure 10. WBFULL Signal Timing Specificalons 
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Figure 11. OUTEN Timing Specifications 





Figure 12. Match and Error Latch Timing Specifications 


Figure 13. Address/Data Out, Access Type Out 
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PIN CONFIGURATION 
PLASTIC LEADED CHIP CARRIER 
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INTEGRATED IDT 79R3051™, 79R3051E 
RISControllers™ IDT 79R3052™ , 79R3052E 
Integrated Device Technology, Inc. 
FEATURES: : 
* Instruction set compatible with IDT79R3000A and — On-chip DMA arbiter 
IDT79R3001 MIPS RISC CPUs — Bus Interface Minimizes Processor Stalls 
* High level of integration minimizes system cost, power Single clock input 
consumption Direct interface to R3720/21/22 RlSChipset 


35 MIPS, over 64,000 Dhrystones at 40 MHz 

Low cost 84-pin PLCC packaging 

Flexible bus interface allows simple, low cost designs 
20, 25, 33, and 40 MHz operation 

Complete software support 


—  79R3000A /79R3001 Execution Engine 

—  R8051 features 4kB of Instruction Cache 

—  R8052 features 8kB of Instruction Cache 

—  Alldevices feature 2kB of Data Cache 

—  “E” Versions (Extended Architecture) feature full 





function Memory Management Unit, including 64- — Optimizing compilers 
entry Translation Lookaside Buffer (TLB) — Real-time operating systems 
—  4-deep write buffer eliminates memory write stalls —  Monitors/debuggers 
— 4-deep read buffer supports burst refill — _ Floating Point Software 
— Page Description Languages 
BrCond(3:0) 
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INTRODUCTION 

The IDT R3051 Family is a series of high-performance 32- 
bit microprocessors featuring a high level of integration, and 
targeted to high-performance but cost sensitive embedded 
processing applications. The R3051 family is designed to 
bring the high-performance inherent in the MIPS RISC ar- 
chitecture into low-cost, simplified, power sensitive applica- 
tions. 

Functional units were integrated onto the CPU core in order 
to reduce the total system cost, rather than to increase the 
inherent performance of the integer engine. Thus, the R3051 
family is able to offer 35 MIPS of integer performance at 40 
MHz without requiring external SRAM or caches. 

Further, the R3051 family brings dramatic power reduction 
to these embedded applications, allowing the use of low-cost 
packaging for devices up to 25 MHz. The R3051 family allows 
customer applications to bring maximum performance at 
minimum cost. 

Figure 1 shows ablock level representation of the functional 
units within the R3051 family. The R3051 family could be 
viewed as the embodiment of a discrete solution built around 
the IDT 79R3000A or 79R3001. However, by integrating this 
functionality on a single chip, dramatic cost and power re- 
ductions are achieved. 

Currently, there are four members of the R3051 family 
family. All devices are pin and software compatible: the 
differences lie in the amount of instruction cache, and in the 
memory management capabilities of the processor: 

The R3052"E” incorporates 8kB of Instruction Cache, and 
features a full function memory management unit (MMU) 
including a 64-entry fully-associative Translation 
Lookaside Buffer (TLB). This is the same memory 
management unit incorporated in the IDT 79R3000A and 
79R3001. 


* The R3052 also incorporates 8kB of Instruction Cache. 
However, the memory management unit is a much 
simpler subset of the capabilities of the enhanced ver- 
sions of the architecture, and in fact does not use a TLB. 


The R3051"E” incorporates 4kB of Instruction Cache. 
Additionally, this device features the same full function 
MMU (including TLB file) as the R3052"E”, and R3000A. 
The R3051 incorporates 4kB of Instruction Cache, and 


uses the simpler memory management model of the 
R3052. 


An overview of the functional blocks incorporated in these 
devices follows. 


CPU Core 


The CPU core is a full 32-bit RISC integer execution 
engine, capable of sustaining close to single cycle execution 
rate. The CPU core contains a five stage pipeline, and 32 
orthogonal 32-bit registers. The R3051 family implements the 
MIPS-IISA. Infact, the execution engine of the R3051 family 
is the same as the execution engine of the R3000A (and 
R3001). Thus, the R3051 family is binary compatible with 
those CPU engines. 


The execution engine of the R3051 family uses a five-stage 
pipeline to achieve close to single cycle execution. A new 
instruction can be started in every clock cycle; the execution 
engine actually processes five instructions concurrently (in 
various pipeline stages). Figure 2 shows the concurrency 
achieved by the R3051 family pipeline. 
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Figure 2. R3051 Family 5-Stage Pipeline 


System Control Co-Processor 


The R3051 family also integrates on-chip the System 
Control Co-processor, CP0. CPO manages both the excep- 
tion handling capability of the R3051 family, as well as the 
virtual to physical mapping of the R3051 family. 

There are two versions of the R3051 family architecture: 
the Extended Architecture Versions (the R3051E and R3052E) 
contain a fully associative 64-entry TLB which maps 4kB 
virtual pages into the physical address space. The virtual to 
physical mapping thus includes kemel segments which are 
hard mapped to physical addresses, and kernel and user 
segments which are mapped ona page basis by the TLB into 
anywhere within the 4GB physical address space. Inthis TLB, 
8 page translationss can be “locked” by the kemel to insure 
deterministic response in real-time applications. These ver- 
sions thus use the same MMU structure as that found in the 
IDT 79R3000A and 79R3001. Figure 3 shows the virtual to 
physical address mapping found in the extended architecture 
versions of the processor family. 

The Extended Architecture devices allow the system de- 
signer to implement kernel software to dynamically manage 
User task utilization of memory resources, and also allow the 
Kernel to effectively “protect” certain resources from user 
tasks. These capabilities are important in a number of 
embedded applications, from process control (where resource 
protection may be extremely important) to X-Window display 
systems (where virtual memory management is extremely 
important). 
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VIRTUAL 
Oxtfffffff 
Kernel Mapped 
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OQxc0000000 
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Kernel Cached 
0x80000000 ksegO 
User Mapped 
Cacheable 
(kuseg) 
0x00000000 





PHYSICAL 


Physical 
Memory 


3548 MB 


Figure 3. Virtual to Physical Mapping of Extended Architecture Versions 


The base versions of the architecture (the R3051 and 
R3052) remove the TLB and institute a fixed address mapping 
for the various segments of the virtual address space. The 
base processors support distinct kernel and user mode op- 
eration without requiring page management software, leading 
to a simpler software model. The memory mapping used by 
these devices is illustrated in figure 4. Note that the reserved 
address spaces shown are for compatibility with future family 
members. 


When using the base versions of the architecture, the 
system designer can implement a distinction between the 
user tasks and the kernel tasks, without having to execute 
page management software. This distinction can take the 
formof physical memory protection, accomplished by address 
decoding, or in other forms. In systems which do not wish to 
implement memory protection, and wish to have the kernel 
and user tasks operate out of a single unified memory space, 
upper address lines can be ignored by the address decoder, 
and thus all references willbe seen inthe lower gigabyte of the 
physical address space. 
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Figure 4. Virtual to Physical Mapping of Base Architecture Versions 
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Clock Generation Unit 


The R3051 family is driven from a single input clock. On- 
chip, the clock generator unit is responsible for managing the 
interaction of the CPU core, caches, and bus interface. The 
clock generator unit replaces the external delay line required 
in R8000A and R3001 based applications. 


Instruction Cache 


The current family includes two different instruction cache 
sizes: the R3051 family (the R3051 and R3051E) feature 4kB 
of instruction cache, and the R3052 and R3052E each incor- 
porate 8kB of Instruction Cache. For all four devices, the 
instruction cache is organized as a line size of 16 bytes (four 
words). This relatively large cache achieves a hit rate well in 
excess of 95% in most applications, and substantially con- 
tributes to the performance inherent in the R3051 family. The 
cache is implemented as a direct mapped cache, and is 
capable of caching instructions from anywhere within the 4GB 
physical address space. The cache is implemented using 
physical addresses (rather than virtual addresses), and thus 
does not require flushing on context switch. 


Data Cache 


All four devices incorporate an on-chip data cache of 2kB, 
organized as aline size of 4 bytes (one word). This relatively 
large data cache achieves hit rates well in excess of 90% in 
most applications, and contributes substantially to the perfor- 
mance inherent in the R3051 family. As with the instruction 
cache, the data cache is implemented as a direct mapped 





physical address cache. The cache is capable of mapping any 
word within the 4GB physical address space. 

The data cache is implemented as a write through cache, 
to insure that main memory is always consistent with the 
internal cache. In order to minimize processor stalls due to 
data write operations, the bus interface unit incorporates a 4- 
deep write buffer which captures address and data at the 
processor execution rate, allowing it to be retired to main 
memory at a much slower rate without impacting system 
performance. 


Bus Interface Unit 


The R3051 family uses its large internal caches to provide 
the majority of the bandwidth requirements of the execution 
engine, and thus can utilize a simple bus interface connected 
to slow memory devices. 

The R3051 family bus interface utilizes a 32-bit address 
and data bus multiplexed onto a single set of pins. The bus 
interface unit also provides an ALE signal to de-multiplex the 
A/D bus, and simple handshake signals to process processor 
read and write requests. In addition to the read and write 
interface, the R3051 family incorporates a DMA arbiter, to 
allow an external master to control the external bus. 

The R3051 family incorporates a 4-deep write buffer to 
decouple the speed of the execution engine from the speed of 
the memory system. The write buffers capture and FIFO 
processor address and data information in store operations, 
and presents it to the bus interface as write transactions at the 
rate the memory systemcan accommodate. Figure Sillustrates 
a basic write transaction for the R3051/52. 
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Figuro 5. IDT R3051 Family Write Operation (Two Bus Walt Cycles) 
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The R3051/52 read interface performs both single word 
reads and quad word reads. Single word reads work with a 
simple handshake, and quad word reads can either utilize the 
simple handshake (in lower performance, simple systems) or 
utilize a tighter timing mode when the memory system can 
burst data at the processor clock rate. Thus, the system 
designer can choose to utilize page or nibble mode DRAMs 
(and possibly use interleaving), if desired, in high-performance 
systems, or use simpler techniques to reduce complexity. 
Figure 6illustrates a basic single word read; figure 7 illustrates 
a burst block transfer. More aggressive designs could signifi- 
cantly reduce the number of processor stall cycles from those 
shown here. 





In order to accommodate slower quad word reads, the 
R3051 family incorporates a 4-deep read buffer FIFO, so that 
the external interface can queue up data within the processor 
before releasing it to perform a burst fill of the internal caches. 
Figure 8 shows the action of the processor for a “throttled” 
quad word read. Depending on the cost vs. performance 
tradeoffs appropriate to a given application, the system design 
engineer could include true burst support from the DRAM to 
provide for high-performance cache miss processing, or uti- 
lize the read buffer to process quad word reads from slower 
memory systems. 
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Figure 6. IDT R3051 Family Single Word Read Operation (Two Bus Wait Cycles) 
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Figure 7. IDT R3051 Family Burst Read Operation (Two Bus Wait Cycles) 
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Figure 8. IDT R3051 Family Throttled Quad Read Operation (Three Bus Walt Cycles, One Bus Walt Cycle Between Words) 





144 


IDT 79R3051 FAMILY OF INTEGRATED RiSControllers 


ADVANCE INFORMATION 





SYSTEM USAGE 

The IDT R3051 family has been specifically designed to 
easily connect to low-cost memory systems. Typical low-cost 
memory systems utilize slow EPROMs, DRAMs, and appli- 
cation specific peripherals. These systems may also typically 
contain large, slow static RAMs, although the IDT R3051 
family has been designed to not specifically require the use of 
external SRAMSs. 

Figure 9 shows a typical system block diagram. Transpar- 
ent latches are used to de-multiplex the R3051/52 address 
and data busses from the A/D bus. The data paths between 


Reset 
Clk2xIn 
Int(5:0) 


the memory system elements and the R3051/52 A/D bus is 
managed by simple octal devices. Asmall set of simple PALs 
can be used to control the various data path elements, and to 
control the handshake between the memory devices and the 
R3051/52. 

Alternately, the memory interface can be constructed using 
the IDT R3051 family RISChipset, which includes DRAM 
control, data path control for interleaved memories, and other 
general memory and system interface control functions. These 
devices are described in separate data sheets. Figure 10 
illustrates a simple system constructed using the R3051 
family support chip set. 
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Figure 9. Typical R3051 Family Based System 
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Figure 10. R3051 Family Chip Set Based System 
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DEVELOPMENT SUPPORT 

The IDT R3051 family is supported by a rich set of devel- 
opment tools, ranging from system simulation tools through 
prom monitor support, logic analysis tools, and sub-system 
modules. 

Figure 11 is an overview of the system development 
process typically used when developing R3051 family-based 
applications. The R3051 family is supported by powerful tools 
through all phases of project development. These tools allow 
timely, parallel development of hardware and software for 
R3051/52 based applications, and include tools such as: 

* A program, Cache-3051, which allows the performance of 
an R3051 family based system to be modeled and 
understood without requiring actual hardware. 


* Sable, an instruction set simulator. 


* Optimizing compilers from MIPS, the acknowledged 
leader in optimizing compiler technology. 


¢ IDT Cross development tools, available in a variety of 
development environments. 


¢ The high-performance IDT floating point library software. 


* The IDT Evaluation Board, which includes RAM, 
EPROM, I/O, and the IDT Prom Monitor. 

* The IDT Laser Printer System board, which directly 
drives a low-cost print engine, and runs Microsoft 
Truelmage™ Page Description Language on top of 
PeerlessPage™ Advanced Printer Controller BIOS. 

¢ Adobe PostScript™ Page Description Language, ported 
to the R3000 instruction set, runs on the IDT R3051 
family. 

¢ The IDT Prom Monitor, which implements a full prom 
monitor (diagnostics, remote debug support, peek/poke, 
etc.). 


System System System 
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DBG Debugger 
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MIPS Compiler Suite 
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Floating Point Library 

Cross Development Tools 
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Cache-R305x 
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Hardware” 


Logic Analysis 
Diagnostics 
IDT PROM Monitor 
Remote Debug 
Real-Time OS 


Cache-R305x 
Hardware Models 
General CAD Tools 
RISC Sub-systems 
Evaluation Board 
Laser Printer System 





Figure 11. R3051 Family Development Toolchain 
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PERFORMANCE OVERVIEW 


The R3051 family achieves avery high-levelof performance. 


This performance is based on: 

An efficient execution engine. The CPU performs ALU 
operations and store operations at a single cycle rate, 
and has an effective load time of 1.3 cycles, and branch 
execution rate of 1.5 cycles (based on the ability of the 
compilers to avoid software interlocks). Thus, the 
execution engine achieves over 35 MIPS performance 
when operating out of cache. 


Large on-chip caches. The R3051 family contains 
caches which are substantially larger than those on the 
majority of today’s embedded microprocessors. These 
large caches minimize the number of bus transactions 
required, and allow the R3051 family to achieve actual 
sustained performance very close to its peak execution 
rate. 


Autonomous multiply and divide operations. The 
R3051 family features an on-chip integer multiplier/divide 


unit which is separate from the other ALU. This allows 
the R3051 family to perform multiply or divide operations 
in parallel with other integer operations, using a single 
multiply or divide instruction rather than “step” operations. 


Integrated write buffer. The R3051 family features a 
four deep write buffer, which captures store target 
addresses and data at the processor execution rate and 
retires it to main memory at the slower main memory 
access rate. Use of on-chip write buffers eliminates the 
need for the processor to stall when performing store 
operations. 


Burst read support. The R3051 family enables the 
system designer to utilize page mode or nibble mode 
RAMs when performing read operations to minimize the 
main memory read penalty and increase the effective 
cache hit rates. 

These techniques combine to allow the processor to achieve 
35 MIPS integer performance, and over 64,000 dhrystones at 
40 MHz without the use of external caches or zero wait-state 
memory devices. 
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PIN DESCRIPTION: 


IPINNAME | vo | DESCRIPTION 


A/D(31:0) Address/Data: A 32-bit time multiplexed bus which indicates the desired address for a bus transaction 


in one cycle, and which is used to transmit data between this device and external memory resources on 
Addr(3:2) ae 


other cycles. 
Diag(1) ; 
il 









Bus transactions on this bus are logically separated into two phases: during the first phase, information 
about the transfer is presented to the memory system to be captured using the ALE output. This 
information consists of: 






Address(31:4): The high-order address for the transfer is presented. 


BE(3:0): These strobes indicate which bytes of the 32-bit bus will be involved in 
the transfer. 






During write cycles, the bus contains the data to be stored and is driven from the internal write buffer. 
On read cycles, the bus receives the data from the external resource, in either a single word 
transaction or in a burst of four words, and places it into the on-chip read buffer. 











Low Address (3:2) A 2-bit bus which indicates which word is currently expected by the processor. 
Specitically, this two bit bus presents either the address bits for the single word to be transferred (writes 
or single word reads) or functions as a two bit counter starting at '00’ for burst read operations. 










Diagnostic Pin 1. This output indicates whether the current bus read transaction is due to an on- 
chip cache miss, and also presents part of the miss address. The value output on this pin is time 

















multiplexed: 
Cached: During the phase in which the A/D bus presents address information, this 
pin is an active high output which indicates whether the current read is 
a result of a cache miss.The value of this pin at this time in other than 
read cycles is undefined. 
Miss Address (3): During the remainder of the read operation, this output presents 





address bit (3) of the address the processor was attempting to 

reference when the cache miss occurred. Regardless of whether a 
cache miss is being processed, this pin reports the transfer address 
during this time. 













Diagnostic Pin 0. This output distinguishes cache misses due to instruction references from those 
due to data references, and presents the remaining bit of the miss address. The value output on this 
pin is also time multiplexed: 






VD: If the “Cached” Pin indicates acache miss, then a high on this pin at this 
time indicates an instruction reference, and a low indicates a data 
reference. If the read is not due to acache miss but rather an uncached 
reference, then this pin is undefined during this phase. 










Miss Address (2): During the remainder of the read operation, this output presents 
address bit (2) of the address the processor was attempting to 
reference when the cache miss occurred. Regardless of whether a 
cache miss is being processed, this pin reports the transfer address 
during this time. 








Address Latch Enable: Used to indicate that the A/D bus contains valid address information for 
the bus transaction. This signal is used by external logic to capture the address for the transfer. 










Data Input Enable: This signal indicates that the A/D bus is no longer being driven by the processor 
during read cycles, and thus the external memory system may enable the drivers of the memory 
system onto this bus without having a bus conflict occur. During write cycles, or when no bus 
transaction is occurring, this signal is negated. 
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PIN DESCRIPTION (Continued): 
DESCRIPTION 


Burst Transfer/Write Near: On read transactions, this signal indicates that the current bus read 

is requesting a block of four contiguous words from memory. This signal is asserted only in read cycles 
due to cache misses; it is asserted for all |-Cache miss read cycles, and for D-Cache miss read cycles 
if selected at device reset time. 


On write transactions, this output tells the external memory system that the bus interface unit is 
performing back-to-back write transactions to an address within the same 256 byte page as the prior 
write transaction. This signal is usefu! in memory systems which employ page mode or static column 


Read: An output which indicates that the current bus transaction is a read. 


Write: An output which indicates that the current bus transaction is a write. 


Acknowledge: An input which indicates to the device that the memory system has sufficiently 
processed the bus transaction, and that the CPU may either advance to the next write buffer entry or 
process the read data. 


Read Buffer Clock Enable: An input which indicates to the device that the memory system has 
placed valid data on the A/D bus, and that the processor may move the data into the on-chip Read 


System Reference Clock: An output from the CPU which reflects the timing of the internal 
processor sys clock. This clock is used to contro! state transitions in the read buffer, write buffer, 
memory controller, and bus interface unit. 


DMA Arbiter Bus Request: An input to the device which requests that the CPU tri-state its bus 
interface signals so that they may be driven by an external master. 


DMA Arbiter Bus Grant. An output from the CPU used to acknowledge that a BusReq has been 
detected, and that the bus is relinquished to the external master. 


Branch Condition Port: These external signals are internally connected to the CPU signals 
CpCond(3:0). These signals can be used by the branch on co-processor condition instructions as input 
ports. There are two types of Branch Condition inputs: the SBrCond inputs have special internal 
logic to synchronize the inputs, and thus may be driven by asynchronous agents. The direct Branch 
Condition inputs must be driven synchronously. 


SBrCond(3:2) 
BrCond(1:0) 


Bus Error: Input to the bus interface unit to terminate a bus transaction due to an external bus error. 
This signal is only sampled during read and write operations. If the bus transaction is a read operation, 
then the CPU will take a bus error exception. 


Processor Interrupt: During operation, these signals are logically the same as the Int(5:0) signals 
of the R3000. During processor reset, these signals perform mode initialization of the CPU, but in a 
different (simpler) fashion than the interrupt signals of the R3000. 


There are two types of interrupt inputs: the Sint inputs are internally synchronized by the processor, 
and may be driven by an asynchronous external agent. The other interrupt inputs are not internally 
synchronized. The direct interrupt inputs have one cycle lower latency than the synchronized 
interrupts. 


Master clock Input: This is a double frequency input used to contro! the timing of the CPU. 


Clk2xIn 
ee Internally, the clock generator unit derives the four processor “2xclk” signals from this clock. 


1 Master Processor Reset: This signal initializes the CPU. Mode selection is performed during 
the last cycle of reset. 


Rsvd(4:0) vO Reserved: These five signal pins are reserved for testing and for future revisions of this device. 
Users must not connect these pins. 
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ORDERING INFORMATION 


XXXXX - XX X X 
IDT 





Speed Package Process/ 


Device Type 
Temp. Range 


Blank 
‘'B' 
'M' 


y 
'QSJ' 


'20' 
'95' 
'93' 
'40' 


79R3051 
79R3051E 
79R3052 
79R3052E 
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Commercial Temperature Range 
Compliant to MIL-STD-883, Class B 
Military Temperature Range Only 


84-Pin PLCC 
84-Pin J-Bend Cerpack 


20.0 MHz 
25.0 MHz 
33.33 MHz 
40.0 MHz 


4kB Instruction Cache, No TLB 
4kB Instruction Cache, With TLB 
8kB Instruction Cache, No TLB 
8kB Instruction Cache, With TLB 











r\ ° PACKAGE INFORMATION 
ated . Technology, Inc. 


PACKAGE DIMENSIONS 
84-LEAD CERQUAD (J BEND) 
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PACKAGE DIMENSIONS 
84-LEAD FLATPACK (CAVITY DOWN) 




































































PACKAGE INFORMATION 







1.135 
1.165 
1.000 BSC i 
BSC BSC PIN 1 INDEX MARK 
——S —_ 1.135 
—————— pO pCrC pCa, rr ne 
= SS 
x<XK KK KX rr 
Saas _———— 1,905 
— 3 ROSS T——> BSC | 4.995 
—————} CK KK KX _—————. 
od DOOOOQOOO 
KX KK XO <== 
500 + xX xX xx 4 — 
BSC ee _————— 
——— —— DET. "A” 
014 
.020 
: 105 
1.995 
018 MAX. 
.007 
013 .012 MAX. 
AT BRAZE PADS 
DETAIL A 
NOTES: 


1. All dimensions are in inches, unless otherwise specified. 
2. BSC — Basic lead spacing between centers. 
3. Cross hatched area indicates intergrall metallic heat sink. 
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PACKAGE DIMENSIONS 
84-PIN PGA (CAVITY DOWN) 


060 TOP VIEW 






1.100 BSc +489 


1.235 NOTE 6 


Pom ann E 
B8G0ee6e00000 









INDEX MAR 


025 NOTE 6 



















1.180 
120 060 
1.235 a6 
SEATING PLANE aa 
040 
.060 
100 BSC sai 
.020 
144 Pin PGA 
> o080 
fmf Se See 15 
©OOGOOOOOOOOOOO 
©O©OOOOOOOOOOOOO 
ODOOOQDODOOOOOOOO 





OOO 
OOO 
©O@ 
OOO 
©OO 
©©O®@ 
©OO 
©OO 
©O®O 





SS eee ES Se oO 


NOTES: 


1. All dimensions are in inches, unless otherwise stated. 
2. BSC - Basic Pin Spacing between centers. 


a EXTRA PIN 


OODOODO ODDO OOOOOOO 
ODOODOOODOOOODOOOO 
@© ODO OOOO OOOOOG- 


1.400 
BSC 
1.559 
1.590 


©O®O 
OOO 
OOO 
©O®O 
©O9 
©OO 
OOO 
©OO 
©O®O 



















3. Extra Pin (0-4) electrically connected to D-3. 


4. \ndex mark indicates approx. location. 


0,040 
0.060 
eae 
SEATING 1, 
PLANE 
ee 





0.082 
0.125 


FF 


0.100 {fe 2-016 i ; ¢ 9.040 
BSC 0.020 0.060 
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ee ——ee 


PACKAGE DIMENSIONS 
172-LEAD QUAD FLATPACK | ace 


1.050 1.165 


BSC 525 


A 











0.130 MAX "1 





ard DETAIL A 


aA 


DETAIL MAX 


0.004 
: 0.105 MAX aos * 
1.620 : 





0.008 +0.006 
AT BRAZE 
PADS 


156 


























é.: Integrated Device echnology 


Integrated Device Technology 
3236 Scott Bivd. 

Santa Clara, CA 95054-3090 
(408) 727-6116 

FAX: (408) 492-8674 





© 1990 Integrated Device Technology, Inc. 
Printed in U.S.A. LU iisiem siteieneehte e) 








